You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
138 lines
5.4 KiB
138 lines
5.4 KiB
2 years ago
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||
|
<!-- saved from url=(0064)https://gamma.hdfgroup.org/papers/HISS/030821.IOFlow/IOFlow.html -->
|
||
|
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
|
<title>HDF5 Raw I/O Flow Notes</title>
|
||
|
|
||
|
<meta name="author" content="Quincey Koziol">
|
||
|
</head>
|
||
|
|
||
|
<body text="#000000" bgcolor="#FFFFFF">
|
||
|
|
||
|
<style type="text/css">
|
||
|
OL.loweralpha { list-style-type: lower-alpha }
|
||
|
OL.upperroman { list-style-type: upper-roman }
|
||
|
</style>
|
||
|
|
||
|
<center><h1>HDF5 Raw I/O Flow Notes</h1></center>
|
||
|
<center><h3>Quincey Koziol<br>
|
||
|
koziol@ncsa.uiuc.edu<br>
|
||
|
August 20, 2003
|
||
|
</h3></center>
|
||
|
|
||
|
<ol class="upperroman">
|
||
|
|
||
|
<li><h3><u>Document's Audience:</u></h3>
|
||
|
|
||
|
<ul>
|
||
|
<li>Current H5 library designers and knowledgeable external developers.</li>
|
||
|
</ul>
|
||
|
|
||
|
</li><li><h3><u>Background Reading:</u></h3>
|
||
|
|
||
|
</li><li><h3><u>Introduction:</u></h3>
|
||
|
|
||
|
<dl>
|
||
|
<dt><strong>What is this document about?</strong></dt>
|
||
|
<dd>This document attempts to supplement the flow charts describing
|
||
|
the flow of control for raw data I/O in the library.
|
||
|
</dd> <br>
|
||
|
</dl>
|
||
|
|
||
|
</li><li><h3><u>Figures:</u></h3>
|
||
|
<p>The following figures provide the main information:</p>
|
||
|
<table>
|
||
|
<tr><td><img src="IOFlow.gif" alt="High-Level View of Writing Raw Data" style="height:50%;"></td></tr>
|
||
|
<tr><td><img src="IOFlow2.gif" alt="Perform Serial or Parallel I/O" style="height:50%;"></td></tr>
|
||
|
<tr><td><img src="IOFlow3.gif" alt="Gather/Convert/Scatter" style="height:50%;"></td></tr>
|
||
|
</table>
|
||
|
|
||
|
</li><li><h3><u>Notes From Accompanying Figures:</u></h3>
|
||
|
|
||
|
<p>This section provides notes to augment the information in the accompanying
|
||
|
figures.
|
||
|
</p>
|
||
|
|
||
|
<ol>
|
||
|
<li><b>Validate Parameters</b> - Resolve any H5S_ALL parameters
|
||
|
for dataspace selections to actual dataspaces, allocate
|
||
|
conversion buffers, etc.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Space Allocated in File?</b> - Space may not have been allocated
|
||
|
in the file to store the dataset data, if "late allocation" was chosen
|
||
|
for the allocation time when the dataset was created.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Allocate & Fill Space</b> - These operations allocate both contiguous
|
||
|
and chunked dataset's space in the file. The chunked dataset space
|
||
|
allocation iterates through all the chunks in the file and allocates
|
||
|
both the B-tree information and the raw data in the file. Because of
|
||
|
the way filters work, fill-values are written out for chunked datasets
|
||
|
as they are allocated, instead of as a separate step.
|
||
|
In parallel
|
||
|
I/O, the chunked dataset allocation can potentially be time-consuming,
|
||
|
since all the raw data in the dataset is allocated from one process.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Datatype Conversion Needed?</b> - This currently is the deciding
|
||
|
factor between doing "direct I/O" (in serial or parallel) and needing
|
||
|
to perform gather/convert/scatter operations. I believe that MPI
|
||
|
is capable of performing a limited range of type conversions and if so,
|
||
|
we should add support to detect when they can be used. This will
|
||
|
allow more I/O operations to be performed collectively.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Collective I/O Requested/Allowed?</b> - A user has to both request
|
||
|
that collective I/O occur and also their I/O operation must meet the
|
||
|
requirements that the library sets for supporting collective parallel
|
||
|
I/O:
|
||
|
<ul>
|
||
|
<li>The dataspace must be scalar or simple (which is a no-op really,
|
||
|
since we don't support "complex" dataspaces in the library
|
||
|
currently).
|
||
|
</li>
|
||
|
<li>The selection must be regular. "all" selections
|
||
|
and hyperslab selections that were
|
||
|
made with only one call to H5Sselect_hyperslab() (i.e. not a
|
||
|
hyperslab selection that has been aggregated over multiple
|
||
|
selection calls) are regular. Supporting point and
|
||
|
irregular hyperslab selections are on the "to do" list.
|
||
|
</li>
|
||
|
<li>The dataset must be stored contiguously on disk (as shown in the
|
||
|
figure also). Supporting chunked dataset storage is also
|
||
|
on the "to do" list.
|
||
|
</li>
|
||
|
</ul>
|
||
|
</li>
|
||
|
|
||
|
<li><b>Build "chunk map"</b> - This step still has some scalability issues
|
||
|
as it creates a data structure that is proportional to the number of
|
||
|
chunks which will be written to, which could potentially be very large.
|
||
|
Building the "chunk map" information incrementally is on the "to do"
|
||
|
list also.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Perform Chunked I/O</b> - As the figure shows, there is no support
|
||
|
for collective parallel I/O on chunked datasets currently. As noted
|
||
|
earlier, this is on the "to do" list.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Perform "Direct" Serial I/O</b> - "Direct" serial I/O writes data
|
||
|
from the application's buffer, without any intervening buffer or memory
|
||
|
copies. For maximum efficiency and performance, the elements in the
|
||
|
selections should be adjoining.
|
||
|
</li>
|
||
|
|
||
|
<li><b>Perform Collective Parallel I/O</b> - This step also writes data
|
||
|
directly from an application buffer, but additionally uses collective
|
||
|
MPI I/O operations to combine the data from each process in the parallel
|
||
|
application in an efficient manner.
|
||
|
</li>
|
||
|
</ol>
|
||
|
|
||
|
</li></ol>
|
||
|
|
||
|
|
||
|
|
||
|
</body></html>
|