This library is broken up into three main parts, as well as a certain
compilation and linking framework:
#.:ref:`Core Examples`
#.:ref:`Array Examples`
#.:ref:`BLAS Examples`
#.:ref:`Compilation and Linking`
The ``Core.h`` header contains the necessary macros, flags and objects for interfacing with
basic kernel launching and the CUDA Runtime API. The ``Array.h`` header contains the ``CudaTools::Array``
class which provides a device compatible Array-like class with easy memory management. Lastly,
the ``BLAS.h`` header provides functions BLAS functions through the the cuBLAS library for the GPU,
and Eigen for the CPU. Lastly, a templated Makefile is provided which can be used
for your own project, after following a few rules.
The usage of this libary will be illustrated through examples, and further details
can be found in the other sections. The examples are given in the `samples <https://git.acem.ece.illinois.edu/kjao/CudaTools/src/branch/main/samples>`__ folder.
Throughout this documentation, there are a few common terms that may appear. First,we refer to the CPU as the host, and the GPU as the device. So, a host function refers
to a function runnable on the CPU, and a device function refers to a function that is runnable
on a device. A kernel is a specific function that the host can call to be run on the device.
Core Examples
=============
This file mainly introduces compiler macros and a few classes that are used to improve the
syntax between host and device code. To define and call a kernel, there are a few
macros provided. For example,
..code-block:: cpp
DEFINE_KERNEL(add, int x, int y) {
printf("Kernel: %i\n", x + y);
}
int main() {
KERNEL(add, CudaTools::Kernel::basic(1), 1, 1); // Prints 2.
return 0;
}
The ``DEFINE_KERNEL(name, ...)`` macro takes in the function name and its arguments.
The second argument in the ``KERNEL()`` macro is are the launch parameters for
kernel. The launch parameters have several items, but for 'embarassingly parallel'
cases, we can simply generate the settings with the number of threads. More detail with
creating launch parameters can be found :ref:`here <CudaTools::Kernel::Settings>`. In the above example,
there is only one thread. The rest of the arguments are just the kernel arguments. For more detail,
see :ref:`here <Macros>`.
..warning::
These kernel definitions must be in a file that will be compiled by ``nvcc``. Also,
for header files, there is an additional macro ``DECLARE_KERNEL(name, ...)`` to declare it
and make it available to other files.
Since many applications used classes, a macro is provided to 'convert' a class into