Instruction/ maintenance manual of the product 4.4 Escali
Go to page of 81
Scali MPI Connect TM Users Guide Software release 4.4.
7 September 2005 17:54 Acknowledgement The development of Scali MPI Connect has bene fited greatly from the work of people n ot connected to Scali. We wish especially to than k the developers of MPICH for their work which served as a reference when implementing the first version of Scali MPI Connect.
Scali MPI Connect Release 4.4 - Users Guide i SCALI “BRONZE” SOFTWARE CERTIFICATE (hereinafter referre d to as the “CERTIFICATE”) issued by Scali AS, Olaf Helsets Ve i 6 , 0619 Oslo, Norway (h.
Scali MPI Connect Release 4.4 - Users Guide ii - “CANCELLATION PERIO D” shall mean the period between SHIPPING DA TE AND INST ALLA TION DA TE, or if installation is no t carried out, the period of 30 days after SHIPPING DA TE, counted from the first NORWEGIAN WORKING DA YS after SHIPPING DA TE.
Scali MPI Connect Release 4.4 - Users Guide iii www .scali.com/download free of charge. The Licensee may request such new REVISIONS and BUG FIXES of the RELEASE, and supplementary ma terial thereof , .
Scali MPI Connect Release 4.4 - Users Guide iv III SCALI SERVICES TERMS SCALI BRONZ E SOFTWARE MAIN TENANCE AND S UPPORT SERVICE S Unless otherwise specified in the purchase orde r placed by the LICEN.
Scali MPI Connect Release 4.4 - Users Guide v related to, referring to or caused by SCALI SOFTWARE, then the LICENSEE shall pay SCALI’s standard commercial time rates for all off-site and eventually any on-site services pro vided plus actual trav el and per diem expenses relating to such services.
Scali MPI Connect Release 4.4 - Users Guide vi fully obliged by the terms and conditions se t out in this CER TIFICA T E and SCALI’S prior written approval o f the transfer .
Scali MPI Connect Release 4.4 - Users Guide vii Nothing in this CERTIFICAT E shall be construed as; - a w a r r a n t y o r r e p r e s e n t a t i o n b y S C A L I as to that anything made, used, so.
Scali MPI Connect Release 4.4 - Users Guide viii No action, whether in contr act or tort (including negligence), or otherwise arising out of or in connection this CER TIFICA TE m ay be brought more than six months after the cause of action has occurred.
Scali MPI Connect Release 4.4 - Users Guide ix No term or provision hereof shall be deemed w aived and no breach excused unless such waiver or consent shall be in writing and sign ed by the party claimed to have w aived or consented.
Scali MPI Connect Release 4.4 - Users Guide x.
Scali MPI Connect Release 4.4 Users Guide 1 T able of content s Chapter 1 Introduction .............................................. ...................... 5 1.1 Scali MPI C onnect product context .......... ............... ............ .............
Scali MPI Connect Release 4.4 Users Guide 2 3.2.6 Notes on compiling with MPI-2 features ...... ........... .......... ...................... ..... 23 3.3 Running Scali MPI Connect progra ms............ .................. ........... .......... ......
Scali MPI Connect Release 4.4 Users Guide 3 5.3.1 How to get expected p erformance . ......... ............... ............. ............ ............ 48 5.3.2 Mem ory consumption increase afte r warm-up ................. ............... .............
Scali MPI Connect Release 4.4 Users Guide 4.
Scali MPI Connect Release 4.4 Users Guide 5 Chapter 1 Introduction This manual describes Scali MPI Connect (SMC) in detail. SMC is sold as a separate stand-alone product, with an SMC distribution, and integrat ed with Scali Manage in the SSP distribution.
Section: 1.2 Support Scali MPI Connect Release 4.4 Users Guide 6 CPU-intensive parallel applications are progra mmed using a programm ing library called MPI (Message Passing Interface), the state-of-the-art library for high performance computing.
Section: 1.3 How to read this guide Scali MPI Connect Release 4.4 Users Guide 7 1.2.6 Licensing SMC is licensed using Scali license manager syst em. In order to run SMC a valid demo or a permanent license must be obtained. Customers with valid software maintenance contracts with Scali may request this directly from license@scali.
Section: 1.4 Acronyms and abbreviations Scali MPI Connect Release 4.4 Users Guide 8 IA64 Instruction set Architecture 64 Inte l 64-bit architecture, Itan ium, EPIC Infiniband A high speed interconnect.
Section: 1.5 Terms and conventions Scali MPI Connect Release 4.4 Users Guide 9 1.5 T erms and conventions Unless explicitly specified otherwise, gcc (gnu c-compiler) and bash (gnu Bourne-Again-SHell) are used in all examples. 1.6 T ypographic conventions T erm Description.
Section: 1.6 Typographic conventions Scali MPI Connect Release 4.4 Users Guide 10.
Scali MPI Connect Release 4.4 Users Guide 11 Chapter 2 Description of Scali MPI Connect This chapter gives the details of the operations of Scali MPI Connect (SMC).
Section: 2.2 SMC network devices Scali MPI Connect Release 4.4 Users Guide 12 Figure 2-1: illustrates how applications started with mpimon have their communication system established by a system of daemons on the no des.
Section: 2.2 SMC network devices Scali MPI Connect Release 4.4 Users Guide 13 library, which in turn may (e.g. Myrinet or SCI) or may not require a kernel driver (e.g. TCP/IP). These provider libraries provide a network device to SMC. 2.2.1 Network devices There are two basic types of network devices in SMC, native and DAT.
Section: 2.2 SMC network devices Scali MPI Connect Release 4.4 Users Guide 14 2.2.3.2 DET Scali has developed a device called Direct Ethernet Transport ( DET ) to improve Ethernet performance. This device that bypasses the TCP/IP stack and uses raw Ethernet frames for sending messages.
Section: 2.2 SMC network devices Scali MPI Connect Release 4.4 Users Guide 15 • root# detstat -r det0 # reset statistics for the det0 device. • root# detstat -r -a # resets statistics for all DET devices. 2.2.4 Myrinet 2.2.4.1 GM This is a RDMA capable device that uses the Myricom GM driver and library.
Section: 2.3 Communication protocols on DAT-devices Scali MPI Connect Release 4.4 Users Guide 16 2.2.6 SCI This is a built-in device that uses the Scali SCI dr iver and library (ScaSCI). This driver is for the Dolphin SCI network cards. Please see the ScaSCI Release Notes for specific requirements.
Section: 2.3 Communication protocols on DAT-devices Scali MPI Connect Release 4.4 Users Guide 17 Figure 2-4: Resources and communication concepts in Scal i MPI Connect 2.3.2 Inlining protocol With the in-lining protocol the application’s da ta is included in the message header.
Section: 2.4 Support for other interconnects Scali MPI Connect Release 4.4 Users Guide 18 2.3.5 Zerocopy protocol The zerocopy protocol is special case of the transporter protocol t. It includes the same steps as a transporter except that data is written di rectly into the receivers buffer instead of being buffered in the transporter-ringbuffer.
Section: 2.5 MPI-2 Features Scali MPI Connect Release 4.4 Users Guide 19 ROMIO is a high-performance, portable implemen tation of MPI-IO, the I/O chapter in MPI-2 and has become a de-facto standard for MPI-I/ O (in terms of interface and semantics).
Section: 2.5 MPI-2 Features Scali MPI Connect Release 4.4 Users Guide 20.
Scali MPI Connect Release 4.4 Users Guide 21 Chapter 3 Using Scali MPI Connect This chapter describes how to setup, compile, link and run a program using Scali MPI Connect, and briefly discusses some useful tools for debugging and profiling.
Section: 3.2 Compiling and linking Scali MPI Connect Release 4.4 Users Guide 22 3.2.2 Compiler support Scali MPI Connect is a C library built using th e GNU compiler. Applications can however be compiled with most compilers, as long as th ey are linked with the GNU runtime library.
Section: 3.3 Running Scali MPI Connect programs Scali MPI Connect Release 4.4 Users Guide 23 3.2.5 Notes on Compiling a nd linking on Power series The Power series processors (PowerPC, POWER4 and POWER5) are both 32 and 64 bit capable. There are only 64 bit versions of Linux provided by SUSE and RedHat, and only a 64 bit OS is supported by Scali.
Section: 3.3 Running Scali MPI Connect programs Scali MPI Connect Release 4.4 Users Guide 24 <pid> is the Unix process identifier of the monitor program mpimon . <nodename> is the name of the node where mpimon is running. Note: SMC requires a homogenous file system imag e, i.
Section: 3.3 Running Scali MPI Connect programs Scali MPI Connect Release 4.4 Users Guide 25 This control over placement of processes can be very valuable when application performance depends on all the nodes having the same amount of work to do. 3.3.
Section: 3.3 Running Scali MPI Connect programs Scali MPI Connect Release 4.4 Users Guide 26 By default the processes’ output to stdout all appear in the stdout of mpimon , where they are merged in some random order. It is however po ssible to keep the outputs apart by directing them to files that have unique name s for each process.
Section: 3.3 Running Scali MPI Connect programs Scali MPI Connect Release 4.4 Users Guide 27 For each MPI process SMC will try to establish contact with each other MPI process, in the order listed . This enables mixed interconnect systems, and provides a means for working around failed hardware.
Section: 3.4 Suspending and resuming jobs Scali MPI Connect Release 4.4 Users Guide 28 < proc >: all (default), none, or MPI-process number(s). -part <part> Use nodes from partition <part> -q Keep quiet, no mp imon printout. -t test mode, no MPI program is started <params> Parameters not recognized are passed on t o mpimon .
Section: 3.7 Debugging and profiling Scali MPI Connect Release 4.4 Users Guide 29 As this feature is limited to tcp communication only, it will not have any effect when using native RDMA drivers such as Infiniband or My rinet. Note that the combination of tfdr and failover mode is not supported in this version of Scali MPI Connect.
Section: 3.7 Debugging and profiling Scali MPI Connect Release 4.4 Users Guide 30 3.7.2 Built-in-tools for debugging Built-in tools for debugging in Scali MPI Connect covers discovery of the MPI calls used th rough tracing and timing, and an attachment point to processes that fault with segmentation violation.
Section: 3.8 Controlling communication resources Scali MPI Connect Release 4.4 Users Guide 31 3.8 Controlling communication resources Even though it is normally not necessary to set buffer parameters when running applications, it can be done, e.g., for performance reason s.
Section: 3.9 Good programming practice with SMC Scali MPI Connect Release 4.4 Users Guide 32 3.9 Good programming practice with SMC 3.9.1 Matching MPI_Recv() with MPI_Probe() During development and testing of SMC, Scali has come across several application programs with the following code sequence: while (.
Section: 3.10 Error and warning messages Scali MPI Connect Release 4.4 Users Guide 33 3.9.5 Unsafe MPI programs Because of different buffering behavi or, some programs may run with MPICH, but not with SMC.
Section: 3.11 Mpimon options Scali MPI Connect Release 4.4 Users Guide 34 3.1 1 Mpimon options The full list of optiona accepted by mpimon is listed below. To obtain the actual values used for a particular run include the -verbos e option when starting the application.
Section: 3.11 Mpimon options Scali MPI Connect Release 4.4 Users Guide 35 3.1 1.1 Giving numeric values to mpimon Numeric values can be given as mpimon options in the following way: [<prefix>]&l.
Section: 3.11 Mpimon options Scali MPI Connect Release 4.4 Users Guide 36.
Scali MPI Connect Release 4.4 Users Guide 37 Chapter 4 Profiling with Scali MPI Connect The Scali MPI communication library has a number of built-in timing and trace facil ities. These features are built into the run time version of the library, so no extra recompiling or linking of libraries is needed.
Scali MPI Connect Release 4.4 Users Guide 38 /* find the global sum of the squares */ MPI_Reduce( &my_sum, &sum, 1, MP I_INT, MPI_SUM, 0, MPI_COMM_WORLD ); /* let rank 0 compu te the root mean.
Scali MPI Connect Release 4.4 Users Guide 39 -t <call-list> Enable for MPI_calls in <call-list>. MPI_call = 'MPI_call' | 'call' -x <call-list> Disable for MPI_calls in <call-list>.
Scali MPI Connect Release 4.4 Users Guide 40 0: MPI_Bcast root: 0 Id: 0 my_count = 32768 0: MPI_Scatter Id: 1 1: MPI_Init 1: MPI_Comm_rank Rank: 1 1: MPI_Comm_size Size: 2 1: MPI_Bcast root: 0 Id: 0 m.
Scali MPI Connect Release 4.4 Users Guide 41 From time to time it may be desirable or feasible to trace only one or a few of the processes. Specifying the "-p" options offers the abi lity to pick th e processes to be traced. All MPI-calls are enabled for tracing by defa ult.
Scali MPI Connect Release 4.4 Users Guide 42 1: MPI_Comm_rank 1 3.1us 3.1us 1 3.1us 3.1us 1: MPI_Comm_size 1 1.5us 1.5us 1 1.5us 1.5us 1: MPI_Gather 1 109.9us 109.9us 1 109.9us 109.9us 1: MPI_Init 1 1.0s 1.0s 1 1.0s 1.0s 1: MPI_Keyval_free 1 1.2us 1.2us 1 1.
Section: 4.4 Using the scanalyze Scali MPI Connect Release 4.4 Users Guide 43 "Receive lines" has the following fields: <Comm><rank> recv from <fro m>(<worldFrom>): &.
Section: 4.4 Using the scanalyze Scali MPI Connect Release 4.4 Users Guide 44 user% SCAMPI_TIMING=”-s 10” mpimon ./a ll2all -- r1 r2 produced a 158642 byte file Digesting the massive information i.
Section: 4.5 Using SMC's built-in CPU-usage functionality Scali MPI Connect Release 4.4 Users Guide 45 4.5 Using SMC's built-in CPU-usage functionality Scali MPI Connect has the capability to report wall clock time, and user and system CPU time on all processes with a built-in CPU timing facility.
Section: 4.5 Using SMC's built-in CPU-usage functionality Scali MPI Connect Release 4.4 Users Guide 46.
Scali MPI Connect Release 4.4 Users Guide 47 Chapter 5 T uning SMC to your application Scali MPI Connect allows the user to exercise control over the communication mechanisms through adjustment of the thresholds that steer which mechanism to use for a particular message.
Section: 5.2 How to optimize MPI performance Scali MPI Connect Release 4.4 Users Guide 48 5.2 How to optimize MPI performance There is no universal recipe for getting good performance out of a message passing program. Here are some do’s and don’t’s for SMC.
Section: 5.4 Collective operations Scali MPI Connect Release 4.4 Users Guide 49 5.3.2 Memory consumptio n increase after warm-up Remember t hat group operat ions ( MPI_Comm _{ create, dup, split } ) may involve creating new communication buffers. If th is is a problem, decreasing chunck_size may help.
Section: 5.4 Collective operations Scali MPI Connect Release 4.4 Users Guide 50 4 pair4 5 pipe0 6 pipe1 7 safe def 8 smp By looping through these alternatives the performance of IS varies: algorithm 0: Mop/s total = 95.60 algorithm 1: Mop/s total = 78.
Scali MPI Connect Release 4.4 Users Guide 51 Appendix A Example MPI code A-1 Programs in th e ScaMPIt st p ackage The ScaMPItst package is installed together with installation of Scali MPI Connect. The package contains a number of programs in /opt/scali/examples with executable code in bin/ and source code in src/.
Scali MPI Connect Release 4.4 Users Guide 52 /* read the image */ for ( i = 0; i < numpixels; i ++ ) { fscanf( infile, "%u", &buffer ); pixels[i] = (unsigned char)buffe r; } fclose( i.
Scali MPI Connect Release 4.4 Users Guide 53 } fflush( outfile ); fclose ( outfile ); } } MPI_Finalize(); return 0; } A-2.1 File format The code contains the logic to read and write im ages in .
Scali MPI Connect Release 4.4 Users Guide 54 Appendix B T roubleshooting This appendix offers initial suggestions for what to do when something goes wrong with applications running together with SMC.
Section: Scali MPI Connect Release 4.4 Users Guide 55 B-1.2 Why can I not start mpid? mpid opens a socket and assigns a predefined mpid port number (see /etc/services for more information), to the end point. If mpid is term inated abnormally, the mpid port number cannot be re-used until a system defined timer has expired.
Scali MPI Connect Release 4.4 Users Guide 56 Appendix C Inst all Scali MPI Connect Scali MPI Connect can be installed on clusters in one of two ways, either as part of installing clusters from scratch with Scali Manage, or by installing it on each particular node in systems that do not use Scali Manage.
Section: Scali MPI Connect Release 4.4 Users Guide 57 C-2 Inst all Scali MP I Connect for TCP/IP To install Scali MPI Connect for TCP/IP, please sp ecify the -t option to sm cinstall.
Section: Scali MPI Connect Release 4.4 Users Guide 58 C-5 Inst all Scali MPI Connect for Infiniband When installing for InfiniBand you must obtain a software stack from your vendor .
Section: Scali MPI Connect Release 4.4 Users Guide 59 -n <hostname> - Specify hostname of Scali license server This option tells the software which host to contact to check out a license. Th is can also be manually edited by modifying th e scalm_net_server parameter in /opt/scali/etc/scalm.
Section: Scali MPI Connect Release 4.4 Users Guide 60 C-11.1 Troubleshooting 3rdpa rty DAT providers The only requirements are that the libraries ha ve the proper permissions for shared objects, and that the /et c/dat.conf is formatted according to the standard.
Section: Scali MPI Connect Release 4.4 Users Guide 61.
Scali MPI Connect Release 4.4 Users Guide 62 Appendix D Bracket exp ansion and grouping To ease usage of Scali software on large cl uster configuration, many of the command line utilities have bracket expansio n and grouping functionality.
Section: Scali MPI Connect Release 4.4 Users Guide 63.
Scali MPI Connect Release 4.4 Users Guide 64 Appendix E Related document ation [1] MPI: A Message-Passing Interface Standard The Message Passing Interface Forum, Version 1.1, June 12, 1995, Message Passing Interface Fo rum, http://www.mpi-forum.org [2] MPI: The complete Referenc e: Volume 1, The MPI Core Marc Snir, St eve W.
Section: Scali MPI Connect Release 4.4 Users Guide 65.
Scali MPI Connect Release 4.4 Users Guide 66 List of figures 1-1 A cluster syste m ................... ............. ............ ............. ............. ............ .............. 5 2-1 The way from application startup to execution . ...........
Section: Scali MPI Connect Release 4.4 Users Guide 67.
Scali MPI Connect Release 4.4 Users Guide 68 Index B Benchmarking ScaMPI ........................................................................................................ 48 C Communication protocols in ScaMPI ..................................
Scali MPI Connect Release 4.4 Users Guide 69 SCAMPI_INSTAL L_SIGSEGV_HAND LER, builtin SIGSEGV handler ................................. 30 , 55 SCAMPI_NODENAM E, set hostname ...........................................................................
An important point after buying a device Escali 4.4 (or even before the purchase) is to read its user manual. We should do this for several simple reasons:
If you have not bought Escali 4.4 yet, this is a good time to familiarize yourself with the basic data on the product. First of all view first pages of the manual, you can find above. You should find there the most important technical data Escali 4.4 - thus you can check whether the hardware meets your expectations. When delving into next pages of the user manual, Escali 4.4 you will learn all the available features of the product, as well as information on its operation. The information that you get Escali 4.4 will certainly help you make a decision on the purchase.
If you already are a holder of Escali 4.4, but have not read the manual yet, you should do it for the reasons described above. You will learn then if you properly used the available features, and whether you have not made any mistakes, which can shorten the lifetime Escali 4.4.
However, one of the most important roles played by the user manual is to help in solving problems with Escali 4.4. Almost always you will find there Troubleshooting, which are the most frequently occurring failures and malfunctions of the device Escali 4.4 along with tips on how to solve them. Even if you fail to solve the problem, the manual will show you a further procedure – contact to the customer service center or the nearest service center