@ -21,7 +21,7 @@ for your own project, after following a few rules.
The usage of this libary will be illustrated through examples, and further details
can be found in the other sections. The examples are given in the `samples <https://git.acem.ece.illinois.edu/kjao/CudaTools/src/branch/main/samples>`__ folder.
Throughout this documentation, there are a few common terms that may appear. First,we refer to the CPU as the host, and the GPU as the device. So, a host function refers
Throughout this documentation, there are a few common terms that may appear. First,we refer to the CPU as the host, and the GPU as the device. So, a host function refers
to a function runnable on the CPU, and a device function refers to a function that is runnable
on a device. A kernel is a specific function that the host can call to be run on the device.
@ -42,17 +42,17 @@ macros provided. For example,
return 0;
}
The ``DEFINE_KERNEL(name, ...)`` macro takes in the function name and its arguments.
The ``KERNEL(name, ...)`` macro takes in the function name and its arguments.
The second argument in the ``KERNEL()`` macro is are the launch parameters for
kernel. The launch parameters have several items, but for 'embarassingly parallel'
cases, we can simply generate the settings with the number of threads. More detail with
cases, we can simply generate the settings with the number of threads using ``CudaTools::Kernel::basic``. More detail with
creating launch parameters can be found :ref:`here <CudaTools::Kernel::Settings>`. In the above example,
there is only one thread. The rest of the arguments are just the kernel arguments. For more detail,
see :ref:`here <Macro Functions>`.
..warning::
These kernel definitions must be in a file that will be compiled by ``nvcc``. Also,
for header files, there is an additional macro ``DECLARE_KERNEL(name, ...)`` to declare it
for header files, there is an additional macro ``KERNEL(name, ...)`` to declare it
and make it available to other files.
Since many applications used classes, a macro is provided to 'convert' a class into
@ -192,7 +192,8 @@ situations and with the ``CudaTools::Kernel::basic()`` launch parameters. If com
mark the loop with ``#pragma parallel for`` and attempt to use OpenMP for parallelism.
..warning::
Notice that a view must be passed to the kernel, and not the original object. This
Notice that a view must be passed to the kernel, and not the original object, otherwise a copy
would be made.
The Array also supports other helpful functions, such as multi-dimensional indexing, slicing, and