[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GU: parallel trouble
Dear,
Much to my anger I have found that I am unable to get a parallel run
between two machines I have here. I have installed tcgmsg OK, have a
working .rhosts (I can rsh all over the place), and gamess.p looks good
as well (and when starting a run, I see him give the correct hostname and
username, and so on)... Now what is really puzzling me is that when I go
from one IBM RS6000 under AIX 3.2.5 to a SUN for parallel execution of
gamess I find the parallel version to work, no problem.
When I start from the 3CT under aix 4.1.4 and take the SUN along, all
goes fine.
When I take the IBM 320 and the 3CT : noting goes like it should ! As I
said I can rsh from one to another, and the pargms output seems to be
looking in the right places and all.
Pargms :
320> pargms test
----- GAMESS execution script -----
This job is running on host 320 at Wed Jun 5 13:58:11 DFT 1996
cp test.inp /u1/test.F05
setenv IRCDATA /u1/test.irc
setenv INPUT /u1/test.F05
setenv PUNCH /u1/test.dat
setenv AOINTS /u1/test.F08
setenv MOINTS /u1/test.F09
setenv DICTNRY /u1/test.F10
setenv DRTFILE /u1/test.F11
setenv CIVECTR /u1/test.F12
setenv NTNFMLA /u1/test.F13
setenv CIINTS /u1/test.F14
setenv WORK15 /u1/test.F15
setenv WORK16 /u1/test.F16
setenv CSFSAVE /u1/test.F17
setenv FOCKDER /u1/test.F18
setenv DASORT /u1/test.F20
setenv JKFILE /u1/test.F23
setenv ORDINT /u1/test.F24
setenv EFPIND /u1/test.F25
unset echo
parallel test
tmp = /u2/patrick/pdir/test.p
Creating: host=320, user=patrick,
file=/u3/gamess/pargamess.00.x, port=1031
Creating: host=3CT, user=patrick,
file=/u/patrick/progs/gamess/pargamess.00.x, port=1033
sock=4, pid=0, nread=-1, len=24
0: ReadXdrLong: ReadFromSocket failed -1 (0xffffffff).
system error message: Connection reset by peer
2: interrupt
WaitAll: Child (10937) finished, status=0x100 (exited with code 1).
WaitAll: No children or error in wait?
unset echo
----- accounting info -----
Wed Jun 5 13:58:13 DFT 1996
-rw-r--r-- 1 patrick sys 425650 Jun 5 13:58 /u1/test.F05
mv: 0653-401 Cannot rename /u1/test.dat to test.dat:
A file or directory in the path name does not exist.
Files from 3CT are:
/u/SCR/test.* not found
rm: /u/SCR/test.F*: No such file or directory
0.0u 1.0s 0:05 35% 83+97k 0+0io 3pf+0w
Now, what is wrong here ? Am I missing something simple here ? The 3CT is
relatively new, so I am not yet into the ins and out of AIX 4.1.4. My
GAMESS version is pretty standard; with NBO plugged in. All is working
stand-alone too !
Machine 1 Machine 2 OS Results
***************************************************************
IBM RS6K 320 IBM RS6K 3CT AIX 3.2.5 - 4.1.4 Trouble
IBM RS6K 320 SUN SPARC AIX 3.2.5 - Solaris OK
IBM RS6K 3CT SUN SPARC AIX 3.2.5 - Solaris OK
Mind you, I also had PVM installed for a while to be able to so other
parallel jobs too, this worked like a charm on IBM RS6K 320 IBM RS6K
3CT combinations.
Patrick
*****************************************************************************
Patrick Bultinck Macrocycles Quantum Chemical
Ph. D. Student Calculations
Dept. Inorganic & Physical Chemistry
University of Ghent Tel. Int'l code/32/9/264.44.44
Krijgslaan 281 (S-3) Fax. Int'l code/32/9/264.49.83
9000 Gent E-mail : Patrick.Bultinck@rug.ac.be
Belgium http://allserv.rug.ac.be/~pbultink/
*****************************************************************************