You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					
					
						
							251 lines
						
					
					
						
							10 KiB
						
					
					
				
			
		
		
	
	
							251 lines
						
					
					
						
							10 KiB
						
					
					
				| Problems existing in Zoltan.
 | |
| This file was last updated on $Date$
 | |
| 
 | |
| -------------------------------------------------------------------------------
 | |
| ERROR CONDITIONS IN ZOLTAN
 | |
| When a processor returns from Zoltan to the application due to an error
 | |
| condition, other processors do not necessarily return the same condition.
 | |
| In fact, other processors may not know that the processor has quit Zoltan,
 | |
| and may hang in a communication (waiting for a message that is not sent 
 | |
| due to the error condition).  The parallel error-handling capabilities of
 | |
| Zoltan will be improved in future releases.
 | |
| -------------------------------------------------------------------------------
 | |
| RCB/RIB ON ASCI RED
 | |
| On ASCI Red, the number of context IDs (e.g., MPI Communicators) is limited
 | |
| to 8192.  The environment variable MPI_REUSE_CONTEXT_IDS must be set to 
 | |
| reuse the IDs; setting this variable, however, slows performance.
 | |
| An alternative is to set Zoltan_Parameter TFLOPS_SPECIAL to "1".  With 
 | |
| TFLOPS_SPECIAL set, communicators in RCB/RIB are not split and, thus, the
 | |
| application is less likely to run out of context IDs.  However, ASCI Red
 | |
| also has a bug that is exposed by TFLOPS_SPECIAL; when messages that use
 | |
| MPI_Send/MPI_Recv within RCB/RIB exceed the MPI_SHORT_MSG_SIZE, MPI_Recv 
 | |
| hangs.  We do not expect these conditions to exist on future platforms and,
 | |
| indeed, plan to make TFLOPS_SPECIAL obsolete in future versions of Zoltan
 | |
| rather than re-work it with MPI_Irecv.  -- KDD 10/5/2004
 | |
| -------------------------------------------------------------------------------
 | |
| ERROR CONDITIONS IN OCTREE, PARMETIS AND JOSTLE
 | |
| On failure, OCTREE, ParMETIS and Jostle methods abort rather than return 
 | |
| error codes.  
 | |
| -------------------------------------------------------------------------------
 | |
| ZOLTAN_INITIALIZE BUT NO ZOLTAN_FINALIZE
 | |
| If Zoltan_Initialize calls MPI_Init, then MPI_Finalize
 | |
| will never be called because there is no Zoltan_Finalize routine.
 | |
| If the application uses MPI and calls MPI_Init and MPI_Finalize,
 | |
| then there is no problem.
 | |
| -------------------------------------------------------------------------------
 | |
| HETEROGENEOUS ENVIRONMENTS
 | |
| Some parts of Zoltan currently assume that basic data types like 
 | |
| integers and real numbers (floats) have identical representation
 | |
| on all processors. This may not be true in a heterogeneous
 | |
| environment. Specifically, the unstructured (irregular) communication
 | |
| library is unsafe in a heterogeneous environment. This problem
 | |
| will be corrected in a future release of Zoltan for heterogeneous
 | |
| systems.
 | |
| -------------------------------------------------------------------------------
 | |
| F90 ISSUES
 | |
| Pacific Sierra Research (PSR) Vastf90 is not currently supported due to bugs 
 | |
| in the compiler with no known workarounds. It is not known when or if this 
 | |
| compiler will be supported. 
 | |
| 
 | |
| N.A.Software FortranPlus is not currently supported due to problems with the 
 | |
| query functions. We anticipate that this problem can be overcome, and support 
 | |
| will be added soon. 
 | |
| -------------------------------------------------------------------------------
 | |
| PROBLEMS EXISTING IN PARMETIS 
 | |
| (Reported to the ParMETIS development team at the University of Minnesota,
 | |
|  metis@cs.umn.edu)
 | |
| 
 | |
| Name: Free-memory write in PartGeomKway
 | |
| Version: ParMETIS 3.1.1
 | |
| Symptom: Free-memory write reported by Purify and Valgrind for graphs with
 | |
|          no edges.
 | |
| Description:
 | |
|   For input graphs with no (or, perhaps, few) edges, Purify and Valgrind
 | |
|   report writes to already freed memory as shown below.
 | |
| FMW: Free memory write:
 | |
|   * This is occurring while in thread 22199:
 | |
|         SetUp(void)    [setup.c:80]
 | |
|         PartitionSmallGraph(void) [weird.c:39]
 | |
|         ParMETIS_V3_PartGeomKway [gkmetis.c:214]
 | |
|         Zoltan_ParMetis [parmetis_interface.c:280]
 | |
|         Zoltan_LB      [lb_balance.c:384]
 | |
|         Zoltan_LB_Partition [lb_balance.c:91]
 | |
|         run_zoltan     [dr_loadbal.c:581]
 | |
|         main           [dr_main.c:386]
 | |
|         __libc_start_main [libc.so.6]
 | |
|         _start         [crt1.o]
 | |
|   * Writing 4 bytes to 0xfcd298 in the heap.
 | |
|   * Address 0xfcd298 is at the beginning of a freed block of 4 bytes.
 | |
|   * This block was allocated from thread -1781075296:
 | |
|         malloc         [rtlib.o]
 | |
|         GKmalloc(void) [util.c:151]
 | |
|         idxmalloc(void) [util.c:100]
 | |
|         AllocateWSpace [memory.c:28]
 | |
|         ParMETIS_V3_PartGeomKway [gkmetis.c:123]
 | |
|         Zoltan_ParMetis [parmetis_interface.c:280]
 | |
|         Zoltan_LB      [lb_balance.c:384]
 | |
|         Zoltan_LB_Partition [lb_balance.c:91]
 | |
|         run_zoltan     [dr_loadbal.c:581]
 | |
|         main           [dr_main.c:386]
 | |
|         __libc_start_main [libc.so.6]
 | |
|         _start         [crt1.o]
 | |
|   * There have been 10 frees since this block was freed from thread 22199:
 | |
|         GKfree(void)   [util.c:168]
 | |
|         Mc_MoveGraph(void) [move.c:92]
 | |
|         ParMETIS_V3_PartGeomKway [gkmetis.c:149]
 | |
|         Zoltan_ParMetis [parmetis_interface.c:280]
 | |
|         Zoltan_LB      [lb_balance.c:384]
 | |
|         Zoltan_LB_Partition [lb_balance.c:91]
 | |
|         run_zoltan     [dr_loadbal.c:581]
 | |
|         main           [dr_main.c:386]
 | |
|         __libc_start_main [libc.so.6]
 | |
|         _start         [crt1.o]
 | |
| Reported: Reported 8/31/09 http://glaros.dtc.umn.edu/flyspray/task/50
 | |
| Status:   Reported 8/31/09
 | |
| 
 | |
| Name: PartGeom limitation
 | |
| Version: ParMETIS 3.0, 3.1
 | |
| Symptom: inaccurate number of partitions when # partitions != # processors
 | |
| Description:
 | |
|   ParMETIS method PartGeom produces decompositions with #-processor 
 | |
|   partitions only.  Zoltan parameters NUM_GLOBAL_PARTITIONS and 
 | |
|   NUM_LOCAL_PARTITIONS will be ignored.
 | |
| Reported: Not yet reported.  
 | |
| Status: Not yet reported.
 | |
| 
 | |
| Name: vsize array freed in ParMetis 
 | |
| Version: ParMETIS 3.0 and 3.1
 | |
| Symptom: seg. fault, core dump at runtime
 | |
| Description:
 | |
|   When calling ParMETIS_V3_AdaptiveRepart with the vsize parameter,
 | |
|   ParMetis will try to free the vsize array even if it was
 | |
|   allocated in Zoltan. Zoltan will then try to free vsize again
 | |
|   later, resulting in a fatal error. As a temporary fix,
 | |
|   Zoltan will never call ParMetis with the vsize parameter. 
 | |
| Reported: 11/25/2003.
 | |
| Status: Acknowledged by George Karypis.
 | |
| 
 | |
| Name: ParMETIS_V3_AdaptiveRepart and ParMETIS_V3_PartKWay crash 
 | |
|       for zero-sized partitions.
 | |
| Version: ParMETIS 3.1
 | |
| Symptom: run-time error "killed by signal 8" on DEC.  FPE, divide-by-zero.
 | |
| Description:
 | |
|   Metis divides by partition size; thus, zero-sized partitions
 | |
|   cause a floating-point exception. 
 | |
| Reported: 9/9/2003.
 | |
| Status: ?
 | |
| 
 | |
| Name: ParMETIS_V3_AdaptiveRepart dies for zero-sized partitions.
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: run-time error "killed by signal 8" on DEC.  FPE, divide-by-zero.
 | |
| Description:
 | |
|    ParMETIS_V3_AdaptiveRepart divides by partition size; thus, zero-sized 
 | |
|    partitions cause a floating-point exception.  This problem is exhibited in 
 | |
|    adaptive-partlocal3 tests.  The tests actually run on Sun and Linux machines 
 | |
|    (which don't seem to care about the divide-by-zero), but cause an FPE 
 | |
|    signal on DEC (Compaq) machines.
 | |
| Reported: 1/23/2003.
 | |
| Status: Fixed in ParMetis 3.1, but new problem appeared (see above).
 | |
| 
 | |
| Name: ParMETIS_V3_AdaptiveRepart crashes when no edges.
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: Floating point exception, divide-by-zero.
 | |
| Description:
 | |
|    Divide-by-zero in ParMETISLib/adrivers.c, function Adaptive_Partition, 
 | |
|    line 40.
 | |
| Reported: 1/23/2003.
 | |
| Status: Fixed in ParMetis 3.1.
 | |
| 
 | |
| Name: Uninitialized memory read in akwayfm.c.
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: UMR warning.
 | |
| Description:
 | |
|    UMR in ParMETISLib/akwayfm.c, function Moc_KWayAdaptiveRefine, near line 520.
 | |
| Reported: 1/23/2003.
 | |
| Status: Fixed in ParMetis 3.1.
 | |
| 
 | |
| Name: Memory leak in wave.c
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: Some memory not freed.
 | |
| Description:
 | |
|    Memory leak in ParMETISLib/wave.c, function WavefrontDiffusion;
 | |
|    memory for the following variables is not always freed:
 | |
|    solution, perm, workspace, cand
 | |
|    We believe the early return near line 111 causes the problem.
 | |
| Reported: 1/23/2003.
 | |
| Status: Fixed in ParMetis 3.1.
 | |
| 
 | |
| Name: tpwgts ignored for small graphs.
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: incorrect output (partitioning)
 | |
| Description:
 | |
|    When using ParMETIS_V3_PartKway to partition into partitions
 | |
|    of unequal sizes, the input array tpwgts is ignored and
 | |
|    uniform-sized partitions are computed. This bug shows up when
 | |
|    (a) the number of vertices is < 10000 and (b) only one weight
 | |
|    per vertex is given (ncon=1).
 | |
| Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/10/30.
 | |
| Status: Fixed in ParMetis 3.1.
 | |
| 
 | |
| 
 | |
| Name: AdaptiveRepart crashes on partless test.
 | |
| Version: ParMETIS 3.0
 | |
| Symptom: run-time segmentation violation. 
 | |
| Description:
 | |
|    ParMETIS_V3_AdaptiveRepart crashes with a SIGSEGV if
 | |
|    the input array _part_ contains any value greater then
 | |
|    the desired number of partitions, nparts. This shows up
 | |
|    in Zoltan's "partless" test cases. 
 | |
| Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/12/02.
 | |
| Status: Fixed in ParMetis 3.1.
 | |
| 
 | |
| 
 | |
| Name: load imbalance tolerance
 | |
| Version: ParMETIS 2.0
 | |
| Symptom: missing feature
 | |
| Description:
 | |
|    The load imbalance parameter UNBALANCE_FRACTION can
 | |
|    only be set at compile-time. With Zoltan it is
 | |
|    necessary to be able to set this parameter at run-time.
 | |
| Reported: Reported to metis@cs.umn.edu on 19 Aug 1999.
 | |
| Status: Fixed in version 3.0.
 | |
| 
 | |
| 
 | |
| Name: no edges
 | |
| Version: ParMETIS 2.0
 | |
| Symptom: segmentation fault at run time
 | |
| Description:
 | |
|    ParMETIS crashes if the input graph has no edges and
 | |
|    ParMETIS_PartKway is called. We suspect all the graph based
 | |
|    methods crash. From the documentation it is unclear if
 | |
|    a NULL pointer is a valid input for the adjncy array.
 | |
|    Apparently, the bug occurs both with NULL as input or
 | |
|    a valid pointer to an array.
 | |
| Reported: Reported to metis@cs.umn.edu on 5 Oct 1999.
 | |
| Status: Fixed in version 3.0.
 | |
| 
 | |
| 
 | |
| Name: no vertices
 | |
| Version: ParMETIS 2.0, 3.0, 3.1
 | |
| Symptom: segmentation fault at run time
 | |
| Description:
 | |
|    ParMETIS may crash if a processor owns no vertices.
 | |
|    The extent of this bug is not known (which methods are affected).
 | |
|    Again, it is unclear if NULL pointers are valid input.
 | |
| Reported: Reported to metis@cs.umn.edu on 6 Oct 1999.
 | |
| Status: Fixed in 3.0 and 3.1 for the graph methods, but not the geometric methods.
 | |
|         New bug report sent on 2003/08/20.
 | |
| 
 | |
| 
 | |
| Name: partgeom bug
 | |
| Version: ParMETIS 2.0
 | |
| Symptom: floating point exception
 | |
| Description:
 | |
|    For domains where the global delta_x, delta_y, or delta_z (in 3D)
 | |
|    is zero (e.g., all nodes lie along the y-axis), a floating point
 | |
|    exception can occur when the partgeom algorithm is used.
 | |
| Reported: kirk@cs.umn.edu in Jan 2001.
 | |
| Status: Fixed in version 3.0.
 | |
| 
 | |
| -------------------------------------------------------------------------------
 | |
| 
 | |
| 
 |