Ik heb twee xubuntu 64 bit VMs op VMware 1.8 server.
De eerste VM gaf bij het folden steed problemen en dan heb ik de 2de gecopiëerd maar nu komt het probleem terug.
Hieronder de volledige log.
Iemand een tip?
Code: Selecteer alles
[22:18:19] Completed 250000 out of 250000 steps (100%)
Writing final coordinates.
Average load imbalance: 127.3 %
Part of the total run time spent waiting due to load imbalance: 71.1 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Z 0 %
NOTE: 71.1 % performance was lost due to load imbalance
in the domain decomposition.
NOTE: 15 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 173757.000 173757.000 100.0
2d00h15:57
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 72.276 3.040 0.241 99.521
gcq#0: Thanx for Using GROMACS - Have a Nice Day
[22:18:21] DynamicWrapper: Finished Work Unit: sleep=10000
[22:18:25]
[22:18:25] Finished Work Unit:
[22:18:25] - Reading up to 21148416 from "work/wudata_03.trr": Read 21148416
[22:18:25] trr file hash check passed.
[22:18:25] - Reading up to 4533104 from "work/wudata_03.xtc": Read 4533104
[22:18:25] xtc file hash check passed.
[22:18:25] edr file hash check passed.
[22:18:25] logfile size: 188167
[22:18:25] Leaving Run
[22:18:26] - Writing 26014439 bytes of core data to disk...
[22:18:26] ... Done.
Error encountered before initializing MPICH
[22:18:31] - Shutting down core
[22:18:31]
[22:18:31] Folding@home Core Shutdown: FINISHED_UNIT
[22:21:57] CoreStatus = 64 (100)
[22:21:57] Unit 3 finished with 31 percent of time to deadline remaining.
[22:21:57] Updated performance fraction: 0.515123
[22:21:57] Sending work to server
[22:21:57] Project: 2672 (Run 0, Clone 110, Gen 165)
[22:21:57] + Attempting to send results [August 10 22:21:57 UTC]
[22:21:57] - Reading file work/wuresults_03.dat from core
[22:21:57] (Read 26014439 bytes from disk)
[22:21:57] Connecting to http://171.64.65.56:8080/
[22:45:06] Posted data.
[22:45:16] Initial: 0000; - Uploaded at ~18 kB/s
[22:45:16] - Averaged speed for that direction ~24 kB/s
[22:45:16] + Results successfully sent
[22:45:16] Thank you for your contribution to Folding@Home.
[22:45:16] + Number of Units Completed: 95
[22:45:19] - Warning: Could not delete all work unit files (3): Core file absent
[22:45:19] Trying to send all finished work units
[22:45:19] + No unsent completed units remaining.
[22:45:19] - Preparing to get new work unit...
[22:45:19] + Attempting to get work packet
[22:45:19] - Will indicate memory of 1004 MB
[22:45:19] - Connecting to assignment server
[22:45:19] Connecting to http://assign.stanford.edu:8080/
[22:45:19] Posted data.
[22:45:19] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[22:45:19] + News From Folding@Home: Welcome to Folding@Home
[22:45:20] Loaded queue successfully.
[22:45:20] Connecting to http://171.64.65.56:8080/
[22:45:25] Posted data.
[22:45:25] Initial: 0000; - Receiving payload (expected size: 4843462)
[22:45:41] - Downloaded at ~295 kB/s
[22:45:41] - Averaged speed for that direction ~220 kB/s
[22:45:41] + Received work.
[22:45:41] Trying to send all finished work units
[22:45:41] + No unsent completed units remaining.
[22:45:41] + Closed connections
[22:45:41]
[22:45:41] + Processing work unit
[22:45:41] At least 4 processors must be requested.Core required: FahCore_a2.exe
[22:45:41] Core found.
[22:45:41] Working on queue slot 04 [August 10 22:45:41 UTC]
[22:45:41] + Working ...
[22:45:41] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 5433 -version 624'
[22:45:41]
[22:45:41] *------------------------------*
[22:45:41] Folding@Home Gromacs SMP Core
[22:45:41] Version 2.07 (Sun Apr 19 14:51:09 PDT 2009)
[22:45:41]
[22:45:41] Preparing to commence simulation
[22:45:41] - Ensuring status. Please wait.
[22:45:50] - Assembly optimizations manually forced on.
[22:45:50] - Not checking prior termination.
[22:45:52] - Expanded 4842950 -> 24001453 (decompressed 495.5 percent)
[22:45:52] Called DecompressByteArray: compressed_data_size=4842950 data_size=24001453, decompressed_data_size=24001453 diff=0
[22:45:53] - Digital signature verified
[22:45:53]
[22:45:53] Project: 2675 (Run 0, Clone 59, Gen 117)
[22:45:53]
[22:45:53] Assembly optimizations on if available.
[22:45:53] Entering M.D.
[22:45:59] Using Gromacs checkpoints
NNODES=4, MYRANK=0, HOSTNAME=folding2
NNODES=4, MYRANK=1, HOSTNAME=folding2
NNODES=4, MYRANK=2, HOSTNAME=folding2
NODEID=0 argc=23
:-) G R O M A C S (-:
Groningen Machine for Chemical Simulation
:-) VERSION 4.0.99_development_20090307 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
:-) mdrun (-:
Reading file work/wudata_04.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 64
NODEID=1 argc=23
NNODES=4, MYRANK=3, HOSTNAME=folding2
NODEID=2 argc=23
NODEID=3 argc=23
Reading checkpoint file work/wudata_04.cpt generated: Thu May 14 19:58:43 2009
-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090307
Source code file: checkpoint.c, line: 1151
Fatal error:
Checkpoint file is for a system of 146859 atoms, while the current system consists of 146817 atoms
For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------
Thanx for Using GROMACS - Have a Nice Day
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4
gcq#0: Thanx for Using GROMACS - Have a Nice Day
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
[04:01:10] - Autosending finished units... [August 11 04:01:10 UTC]
[04:01:10] Trying to send all finished work units
[04:01:10] + No unsent completed units remaining.
[04:01:10] - Autosend completed