You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
128 lines
5.7 KiB
128 lines
5.7 KiB
==================
|
|
Usage and Examples
|
|
==================
|
|
|
|
|
|
This library is broken up into three main parts, as well as a certain
|
|
compilation and linking framework:
|
|
|
|
#. :ref:`Core Examples`
|
|
#. :ref:`Array Examples`
|
|
#. :ref:`BLAS Examples`
|
|
#. :ref:`Compilation and Linking`
|
|
|
|
The ``Core.h`` header contains the necessary macros, flags and objects for interfacing with
|
|
basic kernel launching and the CUDA Runtime API. The ``Array.h`` header contains the ``CudaTools::Array``
|
|
class which provides a device compatible Array-like class with easy memory management. Lastly,
|
|
the ``BLAS.h`` header provides functions BLAS functions through the the cuBLAS library for the GPU,
|
|
and Eigen for the CPU. Lastly, a templated Makefile is provided which can be used
|
|
for your own project, after following a few rules.
|
|
|
|
The usage of this libary will be illustrated through examples, and further details
|
|
can be found in the other sections. The examples are given in the `samples <https://git.acem.ece.illinois.edu/kjao/CudaTools/src/branch/main/samples>`__ folder.
|
|
Throughout this documentation, there are a few common terms that may appear. First,we refer to the CPU as the host, and the GPU as the device. So, a host function refers
|
|
to a function runnable on the CPU, and a device function refers to a function that is runnable
|
|
on a device. A kernel is a specific function that the host can call to be run on the device.
|
|
|
|
Core Examples
|
|
=============
|
|
This file mainly introduces compiler macros and a few classes that are used to improve the
|
|
syntax between host and device code. To define and call a kernel, there are a few
|
|
macros provided. For example,
|
|
|
|
.. code-block:: cpp
|
|
|
|
DEFINE_KERNEL(add, int x, int y) {
|
|
printf("Kernel: %i\n", x + y);
|
|
}
|
|
|
|
int main() {
|
|
KERNEL(add, CudaTools::Kernel::basic(1), 1, 1); // Prints 2.
|
|
return 0;
|
|
}
|
|
|
|
The ``DEFINE_KERNEL(name, ...)`` macro takes in the function name and its arguments.
|
|
The second argument in the ``KERNEL()`` macro is are the launch parameters for
|
|
kernel. The launch parameters have several items, but for 'embarassingly parallel'
|
|
cases, we can simply generate the settings with the number of threads. More detail with
|
|
creating launch parameters can be found :ref:`here <CudaTools::Kernel::Settings>`. In the above example,
|
|
there is only one thread. The rest of the arguments are just the kernel arguments. For more detail,
|
|
see :ref:`here <Macros>`.
|
|
|
|
.. warning::
|
|
These kernel definitions must be in a file that will be compiled by ``nvcc``. Also,
|
|
for header files, there is an additional macro ``DECLARE_KERNEL(name, ...)`` to declare it
|
|
and make it available to other files.
|
|
|
|
Since many applications used classes, a macro is provided to 'convert' a class into
|
|
being device-compatible. Following the previous example similarly,
|
|
|
|
.. code-block:: cpp
|
|
|
|
class intPair {
|
|
DEVICE_CLASS(intPair)
|
|
public:
|
|
int x, y;
|
|
|
|
intPair(const int x_, const int y_) : x(x_), y(y_) {
|
|
allocateDevice(); // Allocates memory for this intPair on the device.
|
|
updateDevice().wait(); // Copies the memory on the host to the device and waits until finished.
|
|
};
|
|
|
|
HD void swap() {
|
|
int swap = x;
|
|
x = y;
|
|
y = swap;
|
|
};
|
|
};
|
|
|
|
DEFINE_KERNEL(swap, intPair* const pair) { pair->swap(); }
|
|
|
|
int main() {
|
|
intPair pair(1, 2);
|
|
printf("Before: %u, %u\n", pair.x, pair.y); // Prints 1, 2.
|
|
|
|
KERNEL(swap, CudaTools::Kernel::basic(1), pair.that()).wait();
|
|
pair.updateHost().wait(); // Copies the memory from the device back to the host and waits until finished.
|
|
|
|
printf("After: %u, %u\n", pair.x, pair.y); // Prints 2, 1.
|
|
return 0;
|
|
}
|
|
|
|
In this example, we create a class called ``intPair``, which is then made available on the device through
|
|
the ``DEVICE_CLASS(name)`` macro. Specifically, that macro introduces a few functions, like
|
|
``allocateDevice()``, ``updateDevice()``, ``updateHost()``, and ``that()``. That last function
|
|
returns a pointer to the copy on the device. For more details, see :ref:`here <Device Class>`. If we were to pass in the host pointer of the ``intPair`` to the kernel, there would be a illegal memory access.
|
|
|
|
The kernel argument list should **must** consist of pointers to objects, or a non-reference object.
|
|
Otherwise, compilation will fail. In general this is safer, as it forces the programmer to
|
|
acknowledge that the device copy is being passed. For the latter case of a non-reference object,
|
|
you should only do this if there is no issue in creating a copy of the original object. In the above
|
|
example, we could have done this, but for more complicated classes it may result in unwanted behavior.
|
|
|
|
Lastly, since the point of classes is usually to have some member functions, to have them
|
|
available on the device, you must mark them with the compiler macro ``HD`` in front.
|
|
|
|
We also introduce the ``wait()`` function, which waits for the command to complete before
|
|
continuing. Most calls that involve the device are asynchronous, so without proper blocking,
|
|
operations dependent on a previous command are not guaranteed to run correctly. If the code is
|
|
compiled for CPU, then everything will run synchronously, as per usual.
|
|
|
|
.. note::
|
|
Almost all functions that are asynchronous provide an optional 'stream' argument,
|
|
where you can give the name of the stream you wish to use. Different streams run
|
|
asynchronous, but operations on the same stream are FIFO. To define a stream to use
|
|
later, you must call ``CudaTools::Manager::get()->addStream("myStream")`` at some point
|
|
before you use it. For more details, see :ref:`here <CudaTools::Manager>`.
|
|
|
|
|
|
Array Examples
|
|
==============
|
|
|
|
|
|
BLAS Examples
|
|
=============
|
|
|
|
|
|
Compilation and Linking
|
|
=======================
|
|
|