Cloned SEACAS for EXODUS library with extra build files for internal package management.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

252 lines
10 KiB

2 years ago
Problems existing in Zoltan.
This file was last updated on $Date$
-------------------------------------------------------------------------------
ERROR CONDITIONS IN ZOLTAN
When a processor returns from Zoltan to the application due to an error
condition, other processors do not necessarily return the same condition.
In fact, other processors may not know that the processor has quit Zoltan,
and may hang in a communication (waiting for a message that is not sent
due to the error condition). The parallel error-handling capabilities of
Zoltan will be improved in future releases.
-------------------------------------------------------------------------------
RCB/RIB ON ASCI RED
On ASCI Red, the number of context IDs (e.g., MPI Communicators) is limited
to 8192. The environment variable MPI_REUSE_CONTEXT_IDS must be set to
reuse the IDs; setting this variable, however, slows performance.
An alternative is to set Zoltan_Parameter TFLOPS_SPECIAL to "1". With
TFLOPS_SPECIAL set, communicators in RCB/RIB are not split and, thus, the
application is less likely to run out of context IDs. However, ASCI Red
also has a bug that is exposed by TFLOPS_SPECIAL; when messages that use
MPI_Send/MPI_Recv within RCB/RIB exceed the MPI_SHORT_MSG_SIZE, MPI_Recv
hangs. We do not expect these conditions to exist on future platforms and,
indeed, plan to make TFLOPS_SPECIAL obsolete in future versions of Zoltan
rather than re-work it with MPI_Irecv. -- KDD 10/5/2004
-------------------------------------------------------------------------------
ERROR CONDITIONS IN OCTREE, PARMETIS AND JOSTLE
On failure, OCTREE, ParMETIS and Jostle methods abort rather than return
error codes.
-------------------------------------------------------------------------------
ZOLTAN_INITIALIZE BUT NO ZOLTAN_FINALIZE
If Zoltan_Initialize calls MPI_Init, then MPI_Finalize
will never be called because there is no Zoltan_Finalize routine.
If the application uses MPI and calls MPI_Init and MPI_Finalize,
then there is no problem.
-------------------------------------------------------------------------------
HETEROGENEOUS ENVIRONMENTS
Some parts of Zoltan currently assume that basic data types like
integers and real numbers (floats) have identical representation
on all processors. This may not be true in a heterogeneous
environment. Specifically, the unstructured (irregular) communication
library is unsafe in a heterogeneous environment. This problem
will be corrected in a future release of Zoltan for heterogeneous
systems.
-------------------------------------------------------------------------------
F90 ISSUES
Pacific Sierra Research (PSR) Vastf90 is not currently supported due to bugs
in the compiler with no known workarounds. It is not known when or if this
compiler will be supported.
N.A.Software FortranPlus is not currently supported due to problems with the
query functions. We anticipate that this problem can be overcome, and support
will be added soon.
-------------------------------------------------------------------------------
PROBLEMS EXISTING IN PARMETIS
(Reported to the ParMETIS development team at the University of Minnesota,
metis@cs.umn.edu)
Name: Free-memory write in PartGeomKway
Version: ParMETIS 3.1.1
Symptom: Free-memory write reported by Purify and Valgrind for graphs with
no edges.
Description:
For input graphs with no (or, perhaps, few) edges, Purify and Valgrind
report writes to already freed memory as shown below.
FMW: Free memory write:
* This is occurring while in thread 22199:
SetUp(void) [setup.c:80]
PartitionSmallGraph(void) [weird.c:39]
ParMETIS_V3_PartGeomKway [gkmetis.c:214]
Zoltan_ParMetis [parmetis_interface.c:280]
Zoltan_LB [lb_balance.c:384]
Zoltan_LB_Partition [lb_balance.c:91]
run_zoltan [dr_loadbal.c:581]
main [dr_main.c:386]
__libc_start_main [libc.so.6]
_start [crt1.o]
* Writing 4 bytes to 0xfcd298 in the heap.
* Address 0xfcd298 is at the beginning of a freed block of 4 bytes.
* This block was allocated from thread -1781075296:
malloc [rtlib.o]
GKmalloc(void) [util.c:151]
idxmalloc(void) [util.c:100]
AllocateWSpace [memory.c:28]
ParMETIS_V3_PartGeomKway [gkmetis.c:123]
Zoltan_ParMetis [parmetis_interface.c:280]
Zoltan_LB [lb_balance.c:384]
Zoltan_LB_Partition [lb_balance.c:91]
run_zoltan [dr_loadbal.c:581]
main [dr_main.c:386]
__libc_start_main [libc.so.6]
_start [crt1.o]
* There have been 10 frees since this block was freed from thread 22199:
GKfree(void) [util.c:168]
Mc_MoveGraph(void) [move.c:92]
ParMETIS_V3_PartGeomKway [gkmetis.c:149]
Zoltan_ParMetis [parmetis_interface.c:280]
Zoltan_LB [lb_balance.c:384]
Zoltan_LB_Partition [lb_balance.c:91]
run_zoltan [dr_loadbal.c:581]
main [dr_main.c:386]
__libc_start_main [libc.so.6]
_start [crt1.o]
Reported: Reported 8/31/09 http://glaros.dtc.umn.edu/flyspray/task/50
Status: Reported 8/31/09
Name: PartGeom limitation
Version: ParMETIS 3.0, 3.1
Symptom: inaccurate number of partitions when # partitions != # processors
Description:
ParMETIS method PartGeom produces decompositions with #-processor
partitions only. Zoltan parameters NUM_GLOBAL_PARTITIONS and
NUM_LOCAL_PARTITIONS will be ignored.
Reported: Not yet reported.
Status: Not yet reported.
Name: vsize array freed in ParMetis
Version: ParMETIS 3.0 and 3.1
Symptom: seg. fault, core dump at runtime
Description:
When calling ParMETIS_V3_AdaptiveRepart with the vsize parameter,
ParMetis will try to free the vsize array even if it was
allocated in Zoltan. Zoltan will then try to free vsize again
later, resulting in a fatal error. As a temporary fix,
Zoltan will never call ParMetis with the vsize parameter.
Reported: 11/25/2003.
Status: Acknowledged by George Karypis.
Name: ParMETIS_V3_AdaptiveRepart and ParMETIS_V3_PartKWay crash
for zero-sized partitions.
Version: ParMETIS 3.1
Symptom: run-time error "killed by signal 8" on DEC. FPE, divide-by-zero.
Description:
Metis divides by partition size; thus, zero-sized partitions
cause a floating-point exception.
Reported: 9/9/2003.
Status: ?
Name: ParMETIS_V3_AdaptiveRepart dies for zero-sized partitions.
Version: ParMETIS 3.0
Symptom: run-time error "killed by signal 8" on DEC. FPE, divide-by-zero.
Description:
ParMETIS_V3_AdaptiveRepart divides by partition size; thus, zero-sized
partitions cause a floating-point exception. This problem is exhibited in
adaptive-partlocal3 tests. The tests actually run on Sun and Linux machines
(which don't seem to care about the divide-by-zero), but cause an FPE
signal on DEC (Compaq) machines.
Reported: 1/23/2003.
Status: Fixed in ParMetis 3.1, but new problem appeared (see above).
Name: ParMETIS_V3_AdaptiveRepart crashes when no edges.
Version: ParMETIS 3.0
Symptom: Floating point exception, divide-by-zero.
Description:
Divide-by-zero in ParMETISLib/adrivers.c, function Adaptive_Partition,
line 40.
Reported: 1/23/2003.
Status: Fixed in ParMetis 3.1.
Name: Uninitialized memory read in akwayfm.c.
Version: ParMETIS 3.0
Symptom: UMR warning.
Description:
UMR in ParMETISLib/akwayfm.c, function Moc_KWayAdaptiveRefine, near line 520.
Reported: 1/23/2003.
Status: Fixed in ParMetis 3.1.
Name: Memory leak in wave.c
Version: ParMETIS 3.0
Symptom: Some memory not freed.
Description:
Memory leak in ParMETISLib/wave.c, function WavefrontDiffusion;
memory for the following variables is not always freed:
solution, perm, workspace, cand
We believe the early return near line 111 causes the problem.
Reported: 1/23/2003.
Status: Fixed in ParMetis 3.1.
Name: tpwgts ignored for small graphs.
Version: ParMETIS 3.0
Symptom: incorrect output (partitioning)
Description:
When using ParMETIS_V3_PartKway to partition into partitions
of unequal sizes, the input array tpwgts is ignored and
uniform-sized partitions are computed. This bug shows up when
(a) the number of vertices is < 10000 and (b) only one weight
per vertex is given (ncon=1).
Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/10/30.
Status: Fixed in ParMetis 3.1.
Name: AdaptiveRepart crashes on partless test.
Version: ParMETIS 3.0
Symptom: run-time segmentation violation.
Description:
ParMETIS_V3_AdaptiveRepart crashes with a SIGSEGV if
the input array _part_ contains any value greater then
the desired number of partitions, nparts. This shows up
in Zoltan's "partless" test cases.
Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/12/02.
Status: Fixed in ParMetis 3.1.
Name: load imbalance tolerance
Version: ParMETIS 2.0
Symptom: missing feature
Description:
The load imbalance parameter UNBALANCE_FRACTION can
only be set at compile-time. With Zoltan it is
necessary to be able to set this parameter at run-time.
Reported: Reported to metis@cs.umn.edu on 19 Aug 1999.
Status: Fixed in version 3.0.
Name: no edges
Version: ParMETIS 2.0
Symptom: segmentation fault at run time
Description:
ParMETIS crashes if the input graph has no edges and
ParMETIS_PartKway is called. We suspect all the graph based
methods crash. From the documentation it is unclear if
a NULL pointer is a valid input for the adjncy array.
Apparently, the bug occurs both with NULL as input or
a valid pointer to an array.
Reported: Reported to metis@cs.umn.edu on 5 Oct 1999.
Status: Fixed in version 3.0.
Name: no vertices
Version: ParMETIS 2.0, 3.0, 3.1
Symptom: segmentation fault at run time
Description:
ParMETIS may crash if a processor owns no vertices.
The extent of this bug is not known (which methods are affected).
Again, it is unclear if NULL pointers are valid input.
Reported: Reported to metis@cs.umn.edu on 6 Oct 1999.
Status: Fixed in 3.0 and 3.1 for the graph methods, but not the geometric methods.
New bug report sent on 2003/08/20.
Name: partgeom bug
Version: ParMETIS 2.0
Symptom: floating point exception
Description:
For domains where the global delta_x, delta_y, or delta_z (in 3D)
is zero (e.g., all nodes lie along the y-axis), a floating point
exception can occur when the partgeom algorithm is used.
Reported: kirk@cs.umn.edu in Jan 2001.
Status: Fixed in version 3.0.
-------------------------------------------------------------------------------