/*
 * Copyright 2008 Sony Corporation of America
 *
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

/**
********************************************************************************
\mainpage MARS - Multicore Application Runtime System
********************************************************************************
<hr>
********************************************************************************
Copyright 2008 Sony Corporation of America

Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2 published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no
Back-Cover Texts. A copy of the license is included in the section entitled "GNU
Free Documentation License".

DISCLAIMER

THIS DOCUMENT IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS
OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, OR TITLE;
THAT THE CONTENTS OF THE DOCUMENT ARE SUITABLE FOR ANY PURPOSE; NOR THAT THE
IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE ANY THIRD PARTY PATENTS,
COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE DOCUMENT OR THE PERFORMANCE
OR IMPLEMENTATION OF THE CONTENTS THEREOF.

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
<b>Additional Resources</b>

- Future releases and other information for MARS:
 - ftp://ftp.infradead.org/pub/Sony-PS3/mars/

- Source repository for MARS:
 - http://git.infradead.org/ps3/mars-src.git
 - git://git.infradead.org/ps3/mars-src.git

- Send bug reports and other MARS inquiries to the cbe-oss-dev mailing list:
 - cbe-oss-dev@ozlabs.org
 - https://ozlabs.org/mailman/listinfo/cbe-oss-dev

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_toc Table of Contents

- \ref sec_1
 - \ref sec_1_1
 - \ref sec_1_2
 - \ref sec_1_3
 - \ref sec_1_4
 - \ref sec_1_5
- \ref sec_2
 - \ref sec_2_1
 - \ref sec_2_2
 - \ref sec_2_3
 - \ref sec_2_4
 - \ref sec_2_5
- \ref sec_3
 - \ref sec_3_1
 - \ref sec_3_2
 - \ref sec_3_3
- \ref sec_4
 - \ref sec_4_1
 - \ref sec_4_2
 - \ref sec_4_3
- \ref sec_5
 - \ref sec_5_1
 - \ref sec_5_2
 - \ref sec_5_3
 - \ref sec_5_4
- \ref sec_6
 - \ref sec_6_1
 - \ref sec_6_2
 - \ref sec_6_3
- \ref sec_7
 - \ref sec_7_1
 - \ref sec_7_2
 - \ref sec_7_3
 - \ref sec_7_4
 - \ref sec_7_5
 - \ref sec_7_6
- \ref sec_8
 - \ref sec_8_1
 - \ref sec_8_2
 - \ref sec_8_3
 - \ref sec_8_4
 - \ref sec_8_5
 - \ref sec_8_6
 - \ref sec_8_7
- \ref sec_9
 - \ref sec_9_1
 - \ref sec_9_2
 - \ref sec_9_3
 - \ref sec_9_4
 - \ref sec_9_5
 - \ref sec_9_6
 - \ref sec_9_7
 - \ref sec_9_8
- \ref sec_10

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_1 1 General Concepts

MARS (Multicore Application Runtime System) is a set of libraries that provides
an API to easily manage and create user programs that will be scheduled to run
on various microprocessing units of a multicore environment.

MARS assumes a target multicore architecture where there is a single host
processor (host) that is managing or controlling the execution of programs or
processes on 1 or more separate microprocessing units (MPUs).

MARS assumes a target audience of application developers focusing on multicore
architectures.

<b> Fig. 1</b>
\image html img_multicore.png

\n
********************************************************************************
\section sec_1_1 1.1 Host Processor (host)

The host processor (host) is the processor on which the host program will be
run.

The host program is responsible for the initialization of all sub programs to be
run on the various microprocessing units (MPU) available on the target multicore
architecture.

The memory area accessible by the host processor will be referred to as the host
storage.

\note

- In the Cell B.E. processor, the host processor is synonymous
with the PPU.

- In the Cell B.E. processor, the host storage is synonymous
with the PPU main storage.

\n
********************************************************************************
\section sec_1_2 1.2 Microprocessing Unit (MPU)

The microprocessing unit (MPU) is any one of many co-processors or DSPs of a
multicore architecture that will be responsible for the execution of a sub
program that may accomplish some form of processing or computation.

The MPU program is the sub program that is initialized by the host program and
executed on the MPU.

The MPU program should be in the ELF format. When the host program initializes
the MPU program for execution, it will need to know the address of the MPU
program ELF image in host storage. The procedures to get the MPU program ELF
image loaded into host storage is platform independent and outside the scope of
MARS.

The memory area accessible by the MPU will be referred to as the MPU storage.

\note

- In the Cell B.E. processor, the MPU is synonymous
with the SPU.

- In the Cell B.E. processor, the MPU storage is synonymous
with the SPU local storage.

- In the Cell B.E. processor, the MPU program ELF image is obtained through
the libspe2 API.

\n
********************************************************************************
\section sec_1_3 1.3 Multicore Programming Limitations

When programming for a multicore architecture, the following limitations become
apparent.

<b>(1) Memory Size of MPU Storage</b>

First, the memory size of the MPU storage is limited. As each application
processing gets more complex and the code sizes of MPU programs get larger, the
size of MPU programs offloaded to the MPUs may exceed the physical memory size
of the MPU storage.

If the size exceeds the memory size of the MPU storage, the offloaded MPU
processing must be partitioned into smaller pieces of code in order to reduce
the code size for each MPU program. As the result of this partitioning of code,
some collaborative processing such as transferring computation results or
waiting for processing completion between various MPU programs becomes
necessary.

<b>(2) Number of Physical MPUs</b>

Second, the number of physical MPUs is limited. Although multi-MPU
parallelization for many processing is required, the allowable number of MPU
processors is limited. If application processing is multi-threaded and many
different MPU programs are run simultaneously on the MPUs, the number of MPUs
necessary for executing MPU programs will easily run out.

To run more MPU programs in parallel than there are physical number of MPUs
available, we need a mechanism to switch currently running MPU programs
depending on the situation. Also, if programs interact with each other like
above (1), the program execution order should be considered when switching and
running MPU programs.

Thus, a complex mechanism to control execution of MPU programs is required for
applications where multi-MPU parallelization is needed.

\n
********************************************************************************
\section sec_1_4 1.4 Host Centric Programming Model

Many multicore applications use a host processor centric programming model. This
means that the host processor is responsible for the loading/switching of MPU
programs as well as the sending/receiving of necessary data to those MPU
programs.

<b> Fig. 1.4</b>
\image html img_host_centric.png

As shown in <b>Fig. 1.4</b>, when using such a host processor centric
programming model, the host processor load becomes heavily utilized in the
managing of all MPU program execution and control operations. Not only will this
tie up the host processor from processing other tasks, but the MPU programs will
also experience a decrease in performance as they wait for the host processor to
finish managing of all MPU programs.

To have finer control over the execution of MPU programs by this host-centric
approach, MPU control code (for loading/switching MPU programs or making
interactions between MPU programs) in host programs become more complex.
Furthermore, this results in the decreased performance of MPU programs due to
the waiting for completion of host programs.

For example, if a host program is running, other host programs in this
application may need to wait for their turn to be processed until the host
program has been completed. This causes a delay in the loading/switching of MPU
programs or sending/receiving data between MPU programs, and consequently MPU
must wait idle even if it is free to run other programs during that idle wait
time.

This is the result of controlling the execution of MPU programs via the host
programs. If such host processing flow can be eliminated, and MPUs can directly
send/receive data and load/switch MPU programs for execution, the MPUs can be
used more efficiently.

\n
********************************************************************************
\section sec_1_5 1.5 MPU Centric Programming Model

MARS provides a MPU centric programming model. The host processor is not
responsible for the loading/switching of MPU programs and sending/receiving
necessary data to those MPU programs. The individual MPUs are responsible
for the loading and execution of MPU programs and switch out the MPU programs as
necessary without the need of the host processor.

<b>Fig. 1.5</b>
\image html img_mpu_centric.png

As shown in <b>Fig. 1.5</b>, loading/switching MPU programs and
sending/receiving data is performed independently of the host program. In making
the MPUs self managing, there is no longer a need to wait for the host processor
to finish MPU management and thus increasing the MPU utilization and
performance.

This also frees up the host processor from much of the MPU management. However
the host processor is still responsible for some of the setup management
necessary for MPU program execution. Other operations which cannot be performed
by the MPU, such as file input/output must also be processed by the host
program.

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_2 2 MARS Concepts

The MARS library is used to provide a runtime environment “Muticore Application
Runtime System” that allows to run MPU programs in parallel on multiple MPUs.

By using the MARS library, multiple MPU programs can be run cooperatively. This
means that applications that run a large number of MPU programs one after
another can be created without taking into account the physical number of MPUs
available while leaving the reponsibility of efficiently switching MPU program
execution up to the MARS library.

\n
********************************************************************************
\section sec_2_1 2.1 Kernel

The MARS kernel is what gets loaded on to each MPU storage and controls program
execution on the MPUs. The kernel is responsible for the scheduling, loading,
executing and loading parameters of the workloads on to the MPUs.

The kernel is a relatively simple and small piece of code that stays resident on
each MPU's storage area. Each kernel has its own scheduler that determines which
workload to process. Based on the scheduled workload, the kernel will load the
necessary MPU program to MPU storage and execute it.

<b>Fig. 2.1</b>
\image html img_kernel.png

As shown in <b>Fig. 2.1</b>, the kernel has 3 basic states of operation. Once
loaded and started, the only reponsibility of the kernel is to search for a
workload to schedule, jump execution to the MPU program of the workload, and
context switch the workload if necessary, then return back to the scheduling
state.

The kernel is a non-preemptive kernel and therefore workloads that are executed
on each MPU will continue to run and use up the MPU's resources until it
finishes execution or enters a wait state. When a workload enters a wait state,
the kernel must handle the context switch. This involves saving the workload
context into host storage for continued execution when the context is scheduled
for execution at a later stage.

\n
********************************************************************************
\section sec_2_2 2.2 Workload Model

A workload model is the programming model that defines how MPU programs will
be processed and synchronized with each other.

A workload is the term used to refer to a single unit of an MPU program or
multiple MPU programs that must be scheduled for execution on the MPUs. The
actual design and behavior of how a workload will be processed after the
workload is scheduled by the kernel will vary based on the workload model.

MARS aims to provide various workload models not specific to just one.
Therefore an abstract MARS workload is necessary to accommodate various
workload models.

One example of a workload model may be a single large process that is executed
on a single MPU, while another example of a workload model may define a large
number of small processes that are executed on various MPUs.

<b>Fig. 2.2</b>
\image html img_workload.png

\note Currently MARS only supports the MARS task model. Therefore, when we refer
to the workload, it is synonymous with the MARS task. However, in the future the
workload may refer to some arbitrary workload model not yet defined.

\n
********************************************************************************
\section sec_2_3 2.3 Workload Module

The MARS workload module is the initial MPU program executed by the MARS kernel
when a workload is scheduled for execution and loaded to MPU storage.
The main responsibility of the workload module is to load and process the
necessary MPU program or programs as specified by the design of the workload
model, update the state of the workload, and return execution back to the MARS
kernel.

Each workload model needs its own workload module implementation that handles
the model specific processing of workloads. The workload module will make use of
the module API provided by the MARS kernel. These kernel system calls are only
accessible by the workload module. The interface between user programs and the
workload module is left up to the workload model design.

<b>Fig. 2.3</b>
\image html img_workload_module.png

\n
********************************************************************************
\section sec_2_4 2.4 Workload Queue

The MARS workload queue is created and initialized at MARS context creation and
resides in host storage. When workloads are created by the host program, they
are stored in this queue.

The MARS kernel is responsible for searching for a schedulable workload in this
queue and when found it loads the workload into MPU storage for processing. Once
the workload is loaded, the MARS kernel passes responsibility of workload
processing to the workload module specified in the workload.

When a workload is scheduled by the kernel, the workload's state within the
queue is set to a reserved state so no other kernel will attempt to schedule the
same workload.

Since this queue is shared by both host and MPU, its access is protected by
atomic operations.

<b>Fig. 2.4</b>
\image html img_workload_queue.png

\n
********************************************************************************
\section sec_2_5 2.5 Context

\copydoc group_mars_context

<b>Fig. 2.5</b>
\image html img_context.png

\note In <b>Fig. 2.5</b>, "other async process" means that the main
program can perform other processes asynchronously while the MPU programs are
being processed by the MARS kernels on the MPUs.

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_3 3 Overview of Usage

\n
********************************************************************************
\section sec_3_1 3.1 Host Library

The host program needs to make use of the host libraries provided by MARS.

Depending on the target platform, MARS should install the necessary host headers
and libraries to the appropriate host paths.

<b>Fig. 3.1</b>
\image html img_host_library.png

In order to use any of the host processor library API, the user must include the
necessary library API headers:

\code
	#include <mars/task.h>	/* header for task workload library API */
\endcode

The host program written for the host processor needs to link in the MARS host
libraries.

MARS provides both static and dynamic libraries for the host processor.

The following are the libraries for the MARS base and task workload model:

\code
	libmars_base.a		/* MARS base static library */
	libmars_base.so		/* MARS base dynamic library */

	libmars_task.a		/* MARS task static library */
	libmars_task.so		/* MARS task task dynamic library */
\endcode

MARS provides these libraries for both 32-bit and 64-bit runtimes.

The actual procedure to compile a MARS host program and to link the MARS host
library may vary depending on the target platform.

\code
	/* Example host 32-bit compile on Cell B.E. platform */
	HOST_CC =	ppu-gcc
	HOST_CFLAGS =	-m32

	$(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars_task -lmars_base

	/* Example host 64-bit compile on Cell B.E. platform */
	HOST_CC =	ppu-gcc
	HOST_CFLAGS =	-m64

	$(HOST_CC) $(HOST_CFLAGS) host_prog.c -lspe2 -lmars_task -lmars_base
\endcode

\note

- Make sure the proper include and lib paths are set for the system.

- In the Cell B.E. compile example, libspe2 is linked in. This is required
because libspe2 provides the MPU program ELF image. This may vary depending on
the platform.

- Other workload model libraries other than the task libraries may be provided
in the future. User implemented workload model libraries may also be used
instead of the task libraries.

\n
********************************************************************************
\section sec_3_2 3.2 MPU Library

The MPU program needs to make use of the MPU library provided by MARS.

Depending on the target platform, MARS should install the necessary MPU headers
and libraries to the appropriate MPU paths.

<b>Fig. 3.2</b>
\image html img_mpu_library.png

The MPU programs written for the MPUs need to link in the MARS MPU library. In
order to use any of the MPU library API, the user must include the necessary
library API headers:

\code
	#include <mars/task.h>	/* header for task workload library API */
\endcode

The MPU program written for the MPU needs to link in the MARS
MPU library.

MARS provides only a static library for the MPU.

The following are the libraries for the MARS base and task workload model:

\code
	libmars_base.a		/* MARS base static library */

	libmars_task.a		/* MARS task static library */
\endcode

When compiling the MPU programs, it is also necessary to specify the '.init'
section to the workload base address specified for the workload model.

For example, a MARS task program should specify the task base address
equal to \ref MARS_TASK_BASE_ADDR (currently 0x4000).

The actual procedure to compile a MARS MPU program and to link the MARS MPU
library may vary depending on the workload model and target platform.

\code
	/* Example MPU compile on Cell B.E. platform for task program */
	MPU_CC =	spu-gcc
	MPU_LD_FLAGS =	-Wl,-N -Wl,-gc-sections -Wl,--section-start,.init=0x4000

	$(MPU_CC) $(MPU_LD_FLAGS) mpu_prog.c -lmars_task -lmars_base
\endcode

\note

- Make sure the proper include and lib paths are set for the system.

- Other workload model libraries other than the task libraries may be provided
in the future. User implemented workload model libraries may also be used
instead of the task libraries.

\n
********************************************************************************
\section sec_3_3 3.3 General Sequence

The general sequence of usage for MARS is described below:

1. Create a MARS context.\n
2. Create a MARS workload.\n
3. Process necessary synchronizations between host and MPU programs.\n
4. Process other host program tasks asynchronous to MPU processing.\n
5. Destroy the MARS workload instance (waits until MARS workload completion).\n
6. Destroy the MARS context.\n

<b>Fig. 3.3</b>
\image html img_sequence.png

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_4 4 Context Management

\n
********************************************************************************
\section sec_4_1 4.1 Context Overview

A MARS context must be created before any base MARS functionalities can be
utilized. The context creation should be the very first thing done by the host
program.

When all processing is completed, the host program must also be responsible for
destroying the created MARS context.

\see group_mars_context

\n
********************************************************************************
\section sec_4_2 4.2 Context Create

Typical MARS context creation code will look like below.

\code
	/* sample host processor side host_prog.c */

	struct mars_context *mars_ctx;		/* mars context pointer */

	/* Create a MARS context */
	int ret = mars_context_create(&mars_ctx, 0, 0);
	if (ret != MARS_SUCCESS)		/* error checking */
		return USER_DEFINED_ERROR;	/* create failed */
\endcode

<b>Context creation parameters:</b>

\code
int mars_context_create(
	struct mars_context **mars,
	uint32_t num_mpus,
	uint8_t shared);
\endcode

\b mars
\n
This is the address of the pointer to MARS context. A MARS context will be
allocated and its address stored in this pointer.

\b num_mpus
\n
This is the number of MPUs you want utilized by this MARS context. The number of
MPUs specified must be available by the system or an error is returned. You can
specify 0 to have MARS utilize all the available MPUs for the context.

\b shared
\n
This specifies if you are requesting a shared context. If you request a shared
context, a global context is returned which can be shared by any libraries that
your application links to that also request a shared context.

\n
********************************************************************************
\section sec_4_3 4.3 Context Destroy

You must properly destroy any MARS contexts you created to properly free up any
resources used by MARS.

\code
	/* Destroy the MARS context previously created */
	ret = mars_context_destroy(mars_ctx);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* destroy failed */
\endcode

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_5 5 Mutex Management

\n
********************************************************************************
\section sec_5_1 5.1 Mutex Overview

\copydoc group_mars_mutex

<b>Fig. 5.1</b>
\image html img_mutex.png

\see group_mars_mutex

\n
********************************************************************************
\section sec_5_2 5.2 Mutex Create

Typical MARS mutex creation code will look like below.

\code
	/* sample host processor side host_prog.c */

	struct mars_mutex *mutex;		/* mars mutex pointer */

	/* Create a MARS mutex */
	int ret = mars_mutex_create(&mutex);
	if (ret != MARS_SUCCESS)		/* error checking */
		return USER_DEFINED_ERROR;	/* create failed */
\endcode

\n
********************************************************************************
\section sec_5_3 5.3 Mutex Usage

In order to protect code blocks from simultaneous execution on any host program
thread or MPU, lock the mutex before the code block and unlock the mutex upon
completion of the code block. The lock call will return immediately if no other
entities have locked the mutex. Otherwise, the lock call will block until the
mutex becomes unlocked and it is able to successfully lock the mutex for itself.

\code
	/* Lock the MARS mutex previously created */
	ret = mars_mutex_lock(mutex);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* destroy failed */

	/* critical code section */

	/* Unlock the MARS mutex previously locked */
	ret = mars_mutex_unlock(mutex);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* destroy failed */
\endcode

\note The mutex API access on the MPU-side is limited to the workload module
API (<b>See</b> \ref group_mars_workload_module ).

\n
********************************************************************************
\section sec_5_4 5.4 Mutex Destroy

You must properly destroy any MARS mutexes you created to properly free up any
resources.

\code
	/* Destroy the MARS mutex previously created */
	ret = mars_mutex_destroy(mutex);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* destroy failed */
\endcode

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_6 6 Workload Model Management

\n
********************************************************************************
\section sec_6_1 6.1 Workload Model Overview

MARS provides the ability to support various workload models.
The MARS base library provides the MARS kernel and all the APIs necessary to
implement the workload model. The MARS base library and kernel on its own are
only responsible for scheduling abstract workload contexts from the workload
queue. In order for the workload context to be managed and executed, a workload
module specific to each workload model must be provided.

\note Currently MARS provides the task workload model. However, more workload
models may be implemented in the future. It is also possible for users to
implement their own workload model and use it specifically for their own
programs.

A workload model typically will consist of the following 3 major components:

\n
<b>1. Workload Model Host Library</b>

The host-side library of the workload model will provide the user with the
interface to create workload contexts and add them to the workload queue so that
the workload can be scheduled for execution by the MARS kernel.

It is the responsibility of this host-side library to populate the contents of
the workload context structure with all the necessary information specific to
the workload model design.

<b>Fig. 6.1a</b>
\image html img_workload_model_host_library.png

<b>Fig. 6.1a</b>, shows that for the task workload model, the host program
depends on the MARS task host library and MARS base host library.

\n
<b>2. Workload Model MPU Library</b>

The MPU-side library of the workload model will provide the user with the
interface to handle any workload model specific functionalities.

It is the responsibility of this MPU-side library to handle any processing of
the workload specific to the workload model design. This library will also need
to call into the workload module implemented specifically for the workload
model. It is left completely up to the design of each workload model as to what
interfaces should be provided between the workload module and MPU-side library.

<b>Fig. 6.1b</b>
\image html img_workload_model_mpu_library.png

<b>Fig. 6.1b</b>, shows that for the task workload model, the MPU task program
depends on the MARS task MPU library and MARS base MPU library.

\n
<b>3. Workload Model Module</b>

The workload module is the MPU program that is loaded and executed by the MARS
kernel when a specific workload context is scheduled and ready to be executed.
Each workload context needs to know the corresponding workload module that will
be responsible for the execution and mangement of the workload.

The workload module will remain resident in the MPU storage as long as the
workload it is responsible for remains in the running state. Its main function
is to load and execute the MPU program specified by the currently scheduled
workload context. The workload module also serves as the communication layer
between the user's workload specific MPU program and the MARS kernel.

\note The workload module must be run-complete, meaning once the workload module
returns execution back to the MARS kernel, it will not resume execution. Each
time the workload module runs, it begins execution from its entry point. If
workloads must be resumed, it is the workload module's responsibility to save
the workload's program state and store any necessary information into the
workload context.

<b>Fig. 6.1c</b>
\image html img_workload_model_module.png

<b>workload module entry</b>\n
The entry point for the workload module must be <b>mars_module_entry</b>.

<b>workload module base address</b>\n
The workload module is loaded into MPU storage by the MARS kernel at the address
specified by \ref MARS_WORKLOAD_MODULE_BASE_ADDR. The size of the workload
module varies for each workload model implementation. Therefore, each workload
model will have the workload module load the workload program to a different
address in MPU storage.

<b>workload module stack</b>\n
The stack symbol for the workload module stack must also be specified. The
stack address should be immediately below the base address of the workload
program that the workload module will load and execute.

Example of how to compile a workload module on a Cell B.E. platform:

\code
	/* Example MPU compile on Cell B.E. platform for workload module */
	MPU_CC =	spu-gcc
	MPU_LD_FLAGS =	-Wl,-N -Wl,-gc-sections \
			-Wl,--entry,mars_module_entry -Wl,-u,mars_module_entry \
			-Wl,--section-start,.init=0x3000 \
			-Wl,--defsym=__stack=0x3ff0

	$(MPU_CC) $(MPU_LD_FLAGS) workload_module.c -lmars_base
\endcode

\n
********************************************************************************
\section sec_6_2 6.2 Workload Queue API (host)

\copydoc group_mars_workload_queue

<b>Fig. 6.2</b>
\image html img_workload_model_host_sequence.png

<b>Fig. 6.2</b> above shows a sample sequence of how the workload queue API
can be used to implement the task workload model's host library.

\see group_mars_workload_queue

\n
********************************************************************************
\section sec_6_3 6.3 Workload Module API (MPU)

\copydoc group_mars_workload_module

<b>Fig. 6.3</b>
\image html img_workload_model_mpu_sequence.png

<b>Fig. 6.3</b> above shows a sample sequence of how the workload module API
can be used to implement the task workload model's MPU library and task workload
module.

\see group_mars_workload_module

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_7 7 Task Management

\n
********************************************************************************
\section sec_7_1 7.1 Task Overview

\copydoc group_mars_task

<b>Fig. 7.1</b>
\image html img_task_overview.png

As shown in <b>Fig. 7.1</b>, the MARS kernel switches which MPU task programs
are being executed on the MPUs. The kernel autonomously executes the tasks on
the MPUs independently from the host. Whenever an MPU is free, the kernel will
load any available task into the MPU storage for execution.

The general flow for using the MARS task is as follows:

1. (host) Prepare the task program ELF image in host storage.\n
2. (host) Create task instances.\n
3. (host) Schedule tasks for execution.\n
4. (task) Schedule sub tasks for execution.\n
5. (task) Wait for sub task completion.\n
6. (task) Resume execution when all sub tasks have completed.\n
7. (task) Process and finish task execution.\n
8. (host) Wait for all tasks to complete.\n
9. (host) Destroy all task instances.\n

\see group_mars_task

\n
********************************************************************************
\section sec_7_2 7.2 Task Program

The MARS task program is the MPU program written for the MPU and is the actual
code that will be executed when the task is run by the MARS kernel. Just as the
host program is compiled using the host compiler, these MPU programs will be
compiled using the MPU's compiler.

The MARS task program must define the \ref mars_task_main function, as that is
the main entry point of the program. This function is what gets called when the
kernel is ready to run the task.

A task program finishes execution when it calls \ref mars_task_exit or returns
from the \ref mars_task_main function.

The arguments (\ref mars_task_args) passed into the \ref mars_task_main function
is specified in the host program when calling \ref mars_task_schedule to allow
the task to be scheduled for execution. If no args are specified when calling
\ref mars_task_schedule, the args passed into the \ref mars_task_main function
is uninitialized and its state is undefined.

\code
	/* sample MPU side mpu_prog.c */

	#include <stdio.h>
	#include <mars/task.h>

	int mars_task_main(const struct mars_task_args *task_args)
	{
		(void)task_args;

		printf("Hello World!\n");

		return 0;
	}
\endcode

\n
********************************************************************************
\section sec_7_3 7.3 Task Create

Typical MARS task creation code will look like below.

\code
	/* sample host processor side host_prog.c */

	struct mars_context *mars_ctx;		/* MARS context pointer */
	struct mars_task_id task_id;		/* MARS task id instance */
	...

	/* Assume MARS context is created as shown above */
	...

	/* Create the task instance */
	int ret = mars_task_create(mars_ctx, &task_id, "Task", elf_image, 0);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* create failed */
\endcode

MARS task creation will initialize a workload instance in the MARS context's
workload queue.

The returned task id is returned to the user. The task id needs to be saved
for management of the task.

Once a task is created, it must be scheduled for execution before it is ever
executed by calling \ref mars_task_schedule.

Any created tasks should be properly cleaned up with a call to \ref
mars_task_destroy when the task will no longer be scheduled for execution.

<b>Task creation parameters:</b>

\code
int mars_task_create(
	struct mars_context *mars,
	struct mars_task_id *id,
	const char *name,
	const void *elf_image,
	uint32_t context_save_size);
\endcode

\b mars
\n
This is the pointer to a created MARS context.

\b id
\n
This is the address of a task id instance that will be initialized upon
successful task creation.

\b name
\n
This specifies a string identifier for the task. The string length must be no
longer than \ref MARS_TASK_NAME_LEN_MAX.

\b elf_image
\n
This specifies the address to the MPU program ELF image loaded into host
storage. This MPU program needs to be a MARS task program.

\b context_save_size
\n
The size of context save area to allocate on host storage to be used during a
task context switch (<b>See</b> \ref sec_7_5).

\n
********************************************************************************
\section sec_7_4 7.4 Task Execution

Typical MARS task execution code will look like below.

\code
	/* sample host processor side host_prog.c */

	struct mars_task_args task_args;        	/* MARS task args */

	/* Sets the task to a schedulable state */
	ret = mars_task_schedule(&task_id, &task_args, 0);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* schedule failed */

	/* Host processor can process something while the MPUs execute the tasks asynchronously. */
	...

	/* Blocks until the scheduled task has finished execution */
	ret = mars_task_wait(&task_id, NULL);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* wait failed */
\endcode

MARS task execution is done by scheduling a created task to be run by the MARS
kernel. The MARS kernels running on the MPUs will automatically schedule it and
load the task over to the MPU to begin execution.

While the MARS kernels process various workloads on the MPU side, the host is
free to do any other processing asynchronous to any workload processing on the
MPUs.

When the user chooses to do so, they can wait for a specific scheduled task to
finish execution.

Any number of host threads or tasks can wait for a specific task to complete
execution as long as it holds the task's id. However, the task being waited on
should not be re-scheduled until all wait calls for the task have returned.
Otherwise it is not guaranteed that all wait calls will return after the
completion of the initial schedule call.

After a MARS task is created, it may be scheduled for execution any number of
times until it is destroyed. However, a task can only be scheduled if it is not
currently in the process of execution.

A MARS task that has been created by the host can be scheduled for execution by
both the host and MPU-side APIs. The behavior of scheduling a task from host or
MPU is identical in nature. If a task schedules a sub task for execution, and
waits for the sub task to finish execution (assuming the use of a blocking wait
call), it will yield its own execution until the sub task has completed. This
allows for other workloads to be processed on the MPU that was executing the
waiting task.

<b>Task scheduling parameters:</b>

\code
int mars_task_schedule(
	struct mars_task_id *id,
	struct mars_task_args *args,
	uint8_t priority);
\endcode

\b id
\n
This is the pointer to the initialized task id of the task to be scheduled for
execution.

\b args
\n
This specifies the argument structure that will be passed into the task
program's \ref mars_task_main function. If NULL is specified for args, the args
passed into the \ref mars_task_main function is uninitialized and its state is
undefined. You should specify NULL only if you are certain the task program will
not access the args passed into \ref mars_task_main function.

\b priority
\n
This specifies the priority of the task. Task priorities range from 0 to 255,
from lowest to highest priority. Higher priority tasks will be scheduled over
lower priority tasks if both are available to be scheduled for execution.

\n
********************************************************************************
\section sec_7_5 7.5 Task Switching

A MARS task switch occurs when a running task either yields or enters a waiting
state due to a call to some blocking synchronization method. The task switch,
performed by the MARS kernel, allows the current state of the task to be saved
to a pre-allocated context save area on the host storage. The context save area
is created during task creation depending on whether a context save area size is
specified in the task parameters.

When the task is no longer in a waiting state and is scheduled by the kernel to
run again, the saved task context will be restored from the host storage back
into MPU storage for resuming of task execution where it left off.

This task switching allows the kernel to schedule other workloads to be executed
on the MPU without wasting valuable processing time while some tasks are left in
a waiting state.

<b>Fig. 7.5a</b>
\image html img_task_switch.png

\n
<b>Limitations</b>

It is important to note the limitations of a task switch:

<b>1.</b> A task is only capable of doing a task switch if it is created with a
context save area (<b>See</b> \ref mars_task_create).
If no context save area is specified for the task, yield calls and any blocking
calls that may put the task into a waiting state will result in error.

<b>2.</b> All MPU-side task API that may call into the MARS kernel scheduler to
enter a waiting state is referred to as a <b>Task Switch Call</b> call in
\ref sec_10.
Before calling any MPU-side <b>Task Switch Call</b>, the user must be
responsible to make sure that all memory transfer operations are completed.
If there are incomplete memory transfer operations while a task switch occurs,
the effects are undefined.

<b>3.</b> All MPU-side task API calls that internally handle memory transfers
(*_begin/*_end) must not call any other MPU-side <b>Task Switch Call</b>
in between the pair of *_begin and *_end calls.
The reason for this limitation is the same as <b>(2)</b>.
The *_begin call, whether it be a <b>Task Switch Call</b> call or not, may begin
a memory transfer.
The memory transfer is not guaranteed to be completed until a paired *_end call.

\n
<b>Context save size</b>

When creating the task context, you must specify the size of the context save
area that will be allocated and used during the task switch. By default, the
task module will only save and restore the used areas of MPU storage necessary
to perform the task switch.

You can specify one of the following for 'context_save_size' when creating the
task with \ref mars_task_create :

<b>1. 0</b> - No context save area will be allocated. Use this to create a
run complete task that never does a task switch

<b>2. \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX</b> - maximum necessary area will be
allocated for a context save. This option will always allocate the maximum area
required for any task to context switch, regardless of whether all of the area
will be necessary or not by the particular task being created.

<b>Fig. 7.5b</b>
\image html img_task_switch_max.png

<b>3. user specified size</b> - user can specify the size of the context save
area necessary to task switch their specific task. For example, the task's text,
data, heap and stack occupies only N bytes of MPU storage, context_save_size = N
can be specified to avoid having to allocate
\ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX bytes that would waste
\ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX - N bytes of unused host storage space.

<b>Fig. 7.5c</b>
\image html img_task_switch_optimal.png

\n
********************************************************************************
\section sec_7_6 7.6 Task Destroy

You must properly destroy any MARS tasks you created to properly free up any
resources used by it.

\code
	/* sample host processor side host_prog.c */

	/* Destroy the task previously created */
	ret = mars_task_destroy(&task_id);
	if (ret != MARS_SUCCESS)			/* error checking */
		return USER_DEFINED_ERROR;		/* destroy failed */
\endcode

MARS task destroy will cleanup the created task and finalize the workload
instance in the workload queue. Once the task is destroyed, the task's resources
will be freed.

This function should be called when the task will no longer be scheduled for
execution by a call to \ref mars_task_schedule. Once a task is destroyed, the
task and task id will become obsolete.

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_8 8 Task Synchronization

\n
********************************************************************************
\section sec_8_1 8.1 Overview

\copydoc group_mars_task_sync

<b>Fig. 8.1</b>
\image html img_task_sync.png

As shown in <b>Fig. 8.1</b>, task synchronization instances are created in host
storage. Both the host program and MPU program's MARS task access these
instances resident on the host storage.

\see group_mars_task_sync

\n
********************************************************************************
\section sec_8_2 8.2 Benefits

The MARS Task Synchronization API is specific to the MARS tasks. Many of the
synchronization methods provided are blocking calls, meaning that when called
and certain conditions are not met, the calling tasks will enter a waiting state
and may result in a task switch (<b>See</b> \ref sec_8_6).

<b>Fig. 8.2</b>
\image html img_task_sync_benefits.png

In <b>Fig. 8.2</b>, the semaphore synchronization method is used as an example
to show the benefit of using the MARS task synchronization over a simple
synchronization method.

When using simple synchronization methods within a MARS task, if the
synchronization method blocks, it will force the task to wait until the
synchronization method allows for execution to resume. If a task must wait on
some synchronization method for a very long time, the MPU executing the task
will be forced to block without being able to process anything else during that
time.

The MARS task synchronization methods prevent the wasting of valuable MPU
processing time during the time a task blocks on some synchronization method.
When a MARS task blocks on some synchronization method, the task itself will
enter a waiting state. This allows for the MPU executing the task to do a task
switch, allowing it to execute some other task that is not in a waiting state.
Once the original task in the waiting state receives the synchronization event
it was waiting for its state will be returned to a runnable state and will be
scheduled for resumed execution when the MPU becomes available.

\n
********************************************************************************
\section sec_8_3 8.3 Task Barrier

\copydoc group_mars_task_barrier

The general flow for using the MARS task barriers is as follows:

1. (host) Allocate memory for task barrier structure.\n
2. (host) Create task barrier.\n
3. (host) Create tasks and schedule for execution.\n
4. (task) Process until synchronization point.\n
5. (task) Notify barrier of synchronization point arrival.\n
6. (task) Wait until all tasks notify barrier and barrier is released.\n
7. (task) Finish task execution.\n
8. (host) Wait for task completion and finalize tasks.\n
9. (host) Destroy task barrier and free allocated memory.\n

<b>Fig. 8.3</b>
\image html img_task_barrier.png

In <b>Fig. 8.3</b>, there is a MARS task barrier created to wait on
notifications from 3 separate tasks.

First, Task A reaches the synchronization point first and notifies the barrier.
Since the barrier has not yet been released, it enters a wait state and yields
the MPU to execute another Task X.

Next, Task C reach the synchronization point soon after and yields MPU execution
to another Task Y after notifying the barrier.
Finally, Task B reaches the synchronization point, at which point it notifies
the barrier and the barrier is released.

Once the barrier is released, Task B continues with execution while both Tasks A
and C are available to be scheduled for execution as soon as there is an
available MPU.

\see group_mars_task_barrier

\n
********************************************************************************
\section sec_8_4 8.4 Task Event Flag

\copydoc group_mars_task_event_flag

The general flow for using the MARS task event flag is as follows:

1. (host) Allocate memory for task event flag structure.\n
2. (host) Create task event flag.\n
3. (host) Create tasks and schedule for execution.\n
4. (task) Process until synchronization point.\n
5. (task) Wait until specified event flag bit is set.\n
6. (host or task) Set the specified event flag bit.\n
7. (task) Finish task execution.\n
8. (host) Wait for task completion and finalize tasks.\n
9. (host) Destroy task event flag and free allocated memory.\n

<b>Fig. 8.4</b>
\image html img_task_event_flag.png

In <b>Fig. 8.4</b>, there are 2 separate MARS event flags created. One event
flag is created for host to MPU communication, while the other is created for
MPU to MPU communication.

First, Task A reaches the synchronization point first and waits for a specific
event flag bit to be set. As it waits for the event, it enters the wait state
and yields execution of the MPU so that Task X can run.

Next, Task B reaches its synchronization point and allows for Task Y to run
while it waits for the event.

Next, the host program sets the event flag bit Task A is waiting on, at which
point Task A becomes available for resumed execution.

Finally, as Task A becomes scheduled and resumes execution it then sets the
event flag bit Task B is waiting on, at which point Task B becomes available for
resumed execution.

\see group_mars_task_event_flag

\n
********************************************************************************
\section sec_8_5 8.5 Task Queue

\copydoc group_mars_task_queue

The general flow for using the MARS task queue is as follows:

1. (host) Allocate memory for task queue structure.\n
2. (host) Create task queue.\n
3. (host) Create tasks and schedule for execution.\n
4. (task) Process until synchronization point.\n
5. (task) Pop queue and wait until data is available.\n
6. (host or task) Push queue with data.\n
7. (task) Receive data and finish task execution.\n
8. (host) Wait for task completion and finalize tasks.\n
9. (host) Destroy task queue and free allocated memory.\n

<b>Fig. 8.5</b>
\image html img_task_queue.png

In <b>Fig. 8.5</b>, there is a MARS queue instance is created to send and
receive data between a host program and MARS tasks.

First, Task A reaches the synchronization point first where it requests to pop
data from the queue. At this point in time, nobody has pushed data into the
queue, and the queue is empty. This causes Task A to enter a wait state and
yield MPU execution to Task X.

Next, Task B reaches its synchronization point and requests to pop data. Since
the queue is still empty, it also enters the waiting state and yields MPU
execution to another Task Y.

Next, the host program push some data into the queue, at which point Task A
becomes available for resumed execution with the data from the host received.

Finally, as Task A becomes scheduled and resumes execution, it then pushes some
other data into the queue, at which point Task B becomes available for resumed
execution with the data from Task A received.

\see group_mars_task_queue

\n
********************************************************************************
\section sec_8_6 8.6 Task Semaphore

\copydoc group_mars_task_semaphore

The general flow for using the MARS task semaphore is as follows:

1. (host) Allocate memory for task semaphore structure.\n
2. (host) Create task semaphore.\n
3. (host) Create tasks and schedule for execution.\n
4. (task) Process until synchronization point.\n
5. (task) Acquire sempahore and wait until semaphore is obtained.\n
6. (task) Modify shared resource data.\n
7. (task) Release semaphore and finish execution.\n
8. (host) Wait for task completion and finalize tasks.\n
9. (host) Destroy task semaphore and free allocated memory.\n

<b>Fig. 8.6</b>
\image html img_task_semaphore.png

In <b>Fig. 8.6</b>, there is a MARS semaphore created to be shared between 2
MARS tasks. This semaphore is used to prevent simultaenous access of some shared
data in the host storage.

First, Task A reaches the synchronization point first where it requests to
acquire the semaphore. Since no other task holds the semaphore, Task A
successfully acquires the semaphore without having to wait. It then continues
execution to modify some shared data in the host storage.

Next, Task B reaches the synchronization point where it requests to acquire the
same semaphore to modify the same shared data in host storage. At the time of
the request to acquire the semaphore, Task A still holds the semaphore, causing
Task B to enter a waiting state. As Task B is waiting, it yields MPU execution
to another Task X.

Next, Task A completes modifying the shared data in host storage and releases
the semaphore. This allows Task B to become available for resumed execution.

Finally, as Task B becomes scheduled for resumed execution, it continues to
modify the shared data in host storage. Task B then releases the semaphore when
access to the shared data is complete.

\see group_mars_task_semaphore

\n
********************************************************************************
\section sec_8_7 8.7 Task Signal

\copydoc group_mars_task_signal

The general flow for using the MARS task signal is as follows:

1. (host) Create tasks and schedule for execution.\n
2. (task) Process until synchronization point.\n
3. (task) Wait for signal.\n
4. (host or task) Send signal to the waiting task.\n
5. (task) Resume and finish execution.\n
6. (host) Wait for task completion and destroy tasks.\n

<b>Fig. 8.7</b>
\image html img_task_signal.png

In <b>Fig. 8.7</b>, there is a host program using signals to synchronize
execution between 2 MARS tasks.

First, Task A reaches the synchronization point first where it waits on a
signal. At this point in time, nothing has signalled Task A and causes it to
enter a wait state and yield MPU execution to Task X.

Next, Task B reaches its synchronization point and waits on a signal. Since
nothing has signalled Task B, it also enters the waiting state and yields MPU
execution to another Task Y.

Next, the host program sends a signal to Task A, at which point Task A becomes
available for resumed execution.

Finally, as Task A becomes scheduled and resumes execution, it signals Task B,
at which point Task B becomes available for resumed execution.

\see group_mars_task_signal

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_9 9 Task Tutorials

This section gives various code tutorials on the basic usage of the MARS Task
and MARS Task Synchronization API.

\note

- The code in the tutorial is simplified for ease of explanation. Please refer
to the acutal code provided in the MARS Samples package for the full
implementations.

- The code in the tutorial does not properly check return values for all calls
into the MARS API to simplify the coding examples. In your actual code, all
return values to every MARS API function call should be checked accordingly.

- The code in the tutorial is platform independent. Any platform specific
implementation details is either not shown, or a generic placeholder is used in
place of platform specify code. Please refer to the specific code explanation
for the details.

- All the tutorials assume knowledge from other sections of this documentation
and the detailed explanation of previous tutorials within this section.

\n
********************************************************************************
\section sec_9_1 9.1 Task Execution from Host

This tutorial will explain how to prepare and schedule a MARS task for
execution from the host program.

The sample code creates and schedules a task that prints that prints "Hello!" to
stdout and exits.

\n
<b>(host program)</b>
\code
 1	#include <mars/task.h>
 2
 3      static void *task_program_elf_image;
 4
 5	int main(void)
 6	{
 7		struct mars_context *mars_ctx;
 8		struct mars_task_id task_id;
 9		int task_exit_code;
10
11		mars_context_create(&mars_ctx, 0, 0);
12		mars_task_create(mars_ctx, &task_id, "Task", task_program_elf_image, 0);
13		mars_task_schedule(&task_id, NULL, 0);
14		mars_task_wait(&task_id, &task_exit_code);
15		mars_task_destroy(&task_id);
16		mars_context_destroy(mars_ctx);
17
18		return 0;
19	}
\endcode

<table border="1">

<tr><td><b>Line:1</b></td><td>
Include the header file "mars/task.h" necessary for utilizing the MARS task
library.
</td></tr>

<tr><td><b>Line:3</b></td><td>
Pointer to the task program's ELF image in host storage. The procedure to
load the task program into host storage is platform specific. Therefore, the
code to do so is not shown anywhere in this sample code.
</td></tr>

<tr><td><b>Line:7</b></td><td>
Declare the MARS context pointer.
</td></tr>

<tr><td><b>Line:8</b></td><td>
Declare the structure for storing the MARS task id.
</td></tr>

<tr><td><b>Line:9</b></td><td>
Declare the instance to store the task exit code.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Create the MARS context instance.

int \ref mars_context_create (

<b>arg1</b>:
This is the address of the pointer to MARS context declared at <b>Line:7</b>.
A MARS context will be created and its address stored in this pointer.

<b>arg2</b>:
This is the number of MPUs you want utilized by this MARS context. The number of
MPUs specified must be available by the system or an error is returned. Here 0
is specifified to have MARS utilize all the available MPUs for the context.

<b>arg3</b>:
This specifies if you are requesting a shared context. Here 0 is specified since
we do not require sharing the MARS context for this sample.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

</td></tr>

<tr><td><b>Line:12</b></td><td>
Create the MARS task instance.

int \ref mars_task_create (

<b>arg1</b>:
Pass in the MARS context pointer.

<b>arg2</b>:
Pass in the pointer to MARS task id structure declared at <b>Line:8</b>. Upon
successful completion, the task id will be initialized as required.

<b>arg3</b>:
Specify the NULL terminated string name of the task you want to create.

<b>arg4</b>:
Specify the address of the task program's ELF image that is loaded into host
storage. The task program specified here is what will be loaded into MPU storage
for execution when this task is scheduled to run by the MARS kernel.

<b>arg5</b>:
Specify the context save area size for this task. Since this task will not task
switch, we do not need to specify a context save size so specify 0. Otherwise,
if we want to create a task that can task switch we must specify a context save
size or specify \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

</td></tr>

<tr><td><b>Line:13</b></td><td>
Schedule the task for execution.

int \ref mars_task_schedule (

<b>arg1</b>:
Pass in the pointer to the task id initialized at <b>Line:12</b>.

<b>arg2</b>:
Pass in the pointer to the task arg structure we want to pass into the task
program's \ref mars_task_main function. For this sample we do not need to pass
any args into the task program so specify NULL.

<b>arg3</b>:
Pass in the value for the scheduling priority this task. Since we only schedule
1 task for execution, the scheduling priority has no effect in this example.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

</td></tr>

<tr><td><b>Line:14</b></td><td>
Wait for the completion of the task.

int \ref mars_task_wait (

<b>arg1</b>:
Pass in the pointer to the task id we want to wait for.

<b>arg2</b>:
Pass in the address of the variable to store the task exit code declared at
<b>Line:9</b>.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

This call will block until the task previously scheduled in <b>Line:13</b>
completes execution. If we want to process some other tasks in the host program
while waiting for the task to complete, we can do so before calling wait.
Similarly, a non-blocking wait function \ref mars_task_try_wait is also provided
to poll for task completion.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Destroy the completed task.

int \ref mars_task_destroy (

<b>arg1</b>:
Pass in the pointer to the task id we want to destroy.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

We can only call this function when we are sure the task has finished. In this
example we are sure of completion because we properly waited for task completion
in <b>Line:18</b>. After the task is destroyed, we can no longer schedule this
task for execution.
</td></tr>

<tr><td><b>Line:16</b></td><td>
Destroy the MARS context.

int \ref mars_context_destroy (

<b>arg1</b>:
Pass in the pointer to the MARS context we want to finalize.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

This unloads all running MARS kernels from the MPUs and handles any necessary
cleanup for the MARS library. No more MARS API calls can be made after this
function until the MARS context is created once again.
</td></tr>

</table>

\n
<b>(task program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		(void)task_args;
 7
 8		printf("MPU(%d): %s - Hello!\n",
 9			mars_task_get_kernel_id(), mars_task_get_name());
10
11		return 0;
12	}
\endcode

<table border="1">

<tr><td><b>Lines:1-2</b></td><td>
Include the header file "stdio.h" for printf and "mars/task.h" necessary for
utilizing the MARS task library.
</td></tr>

<tr><td><b>Line:6</b></td><td>
Since we specified NULL for the task args in <b>Line:13</b> of the <b>host
program</b> above, the state of task_args is undefined. In this program we
do not and should not access the task_args.
</td></tr>

<tr><td><b>Lines:8-9</b></td><td>
Print out message to stdout. The calls to \ref mars_task_get_kernel_id returns
the id of the kernel that the current task is running on. The calls to \ref
mars_task_get_name return the string name of the current running task specified
during task creation at <b>Line:12</b> of the <b>host program</b> above.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Returning from \ref mars_task_main completes execution of the task. This will
signal anything waiting for this task's completion to resume execution. In this
example, the <b>host program</b>'s call to \ref mars_task_wait in
<b>Line:14</b> will return. Equivalent to returning from \ref mars_task_main, we
can also call \ref mars_task_exit. The return value will be returned to the
<b>host program</b> in the variable passed into \ref mars_task_wait.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_2 9.2 Task Execution from MPU

This tutorial will explain how to create and schedule a MARS task for execution
from another task program.

The sample code creates 3 separate task instances. One instance of the main
task 1 program is created and 2 instances of a sub task 2 program is created.

The first main task is scheduled for execution by the host. The main task then
schedules 2 instances of the sub task for execution using the sub task's id's
specified by the arguments passed in by the host during scheduling.

Each instance of the sub task will print out "Hello!" and a unique value
specified by the arguments passed in by the main task during scheduling.

\n
<b>(host program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	#define NUM_SUB_TASKS	2
 5
 6	static void *task1_program_elf_image;
 7	static void *task2_program_elf_image;
 8
 9	int main(void)
10	{
11		struct mars_context *mars_ctx;
12		struct mars_task_id task1_id;
13		struct mars_task_id task2_id[NUM_SUB_TASKS];
14		struct mars_task_args task_args;
15		int i;
16
17		mars_context_create(&mars_ctx, 0, 0);
18
19		mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
20
21		for (i = 0; i < NUM_SUB_TASKS; i++) {
22			char name[16];
23			sprintf(name, "Task 2.%d", i);
24			mars_task_create(mars_ctx, &task2_id[i], name, task2_program_elf_image, 0);
25		}
26
27		task_args.type.u64[0] = mars_ptr_to_ea(&task2_id[0]);
28		task_args.type.u64[1] = mars_ptr_to_ea(&task2_id[1]);
29
30		mars_task_schedule(&task1_id, &task_args, 0);
31		mars_task_wait(&task1_id, NULL);
32		mars_task_destroy(&task1_id);
33
34		for (i = 0; i < NUM_SUB_TASKS; i++)
35			mars_task_destroy(&task2_id[i]);
36
37		mars_context_destroy(mars_ctx);
38
39		return 0;
40	}
\endcode

<table border="1">

<tr><td><b>Line:13</b></td><td>
Declare an instance of the task id structure for each sub task we want to
create and schedule.
</td></tr>

<tr><td><b>Lines:19</b></td><td>
Create the main task instance with the ELF image of task program 1. The main
task needs to provide a context save area in order to allow for context
switching while waiting for sub task completion. Specify
\ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX for the context save area size so a context save
area is initialized for the main task context.
</td></tr>

<tr><td><b>Lines:21-25</b></td><td>
Create the 2 sub task instances with the ELF image of task program 2. The
sub task does not need to do a context switch so no context save area size
needs to specified.
</td></tr>

<tr><td><b>Lines:27-28</b></td><td>
The main task needs to know the addresses of the task ids it plans to schedule
for execution. Store each sub task id address into the task args passed into
the main task's \ref mars_task_main function.
</td></tr>

<tr><td><b>Line:30</b></td><td>
Schedule the main task for execution. Pass in the task args we initialized with
the sub task id addresses at <b>Lines:27-28</b>. Since we only schedule 1 main
task for execution, and the main task is waiting when any one of its sub task's
is being executed, the scheduling priority specified has no effect in this
example.
</td></tr>

<tr><td><b>Line:31</b></td><td>
Wait for the completion of the main task.
</td></tr>

<tr><td><b>Line:32</b></td><td>
Destroy the completed main task.
</td></tr>

<tr><td><b>Line:34-35</b></td><td>
Destroy the completed sub tasks also.
</td></tr>

</table>

\n
<b>(task 1 program)</b>
\code
 1	#include <mars/task.h>
 2
 3	int mars_task_main(const struct mars_task_args *task_args)
 4	{
 5		struct mars_task_id task2_0_id;
 6		struct mars_task_id task2_1_id;
 7		struct mars_task_args args;
 8
 9		get(&task2_0_id, task_args->type.u64[0], sizeof(task2_0_id));
10		get(&task2_1_id, task_args->type.u64[1], sizeof(task2_1_id));
11
12		args.type.u32[0] = 123;
13		mars_task_schedule(&task2_0_id, &args, 0);
14
15		args.type.u32[0] = 321;
16		mars_task_schedule(&task2_1_id, &args, 0);
17
18		mars_task_wait(&task2_0_id, NULL);
19		mars_task_wait(&task2_1_id, NULL);
20
21		return 0;
22	}
\endcode

<table border="1">

<tr><td><b>Line:3</b></td><td>
Since the task args were passed into \ref mars_task_schedule at <b>Line:30</b>
of the <b>host program</b>, task_args is pointing to an initialized \ref
mars_task_args structure.
</td></tr>

<tr><td><b>Line:5</b></td><td>
Declare an instance to store the task id of the first sub task to execute.
</td></tr>

<tr><td><b>Line:6</b></td><td>
Declare an instance to store the task id of the second sub task to execute.
</td></tr>

<tr><td><b>Line:7</b></td><td>
Declare an instance or the task arg structure we want to initialize with
unique IDs to pass into the sub tasks.
</td></tr>

<tr><td><b>Lines:9-10</b></td><td>
Memory transfer from host storage to MPU storage the task id structures of the
initialized sub tasks. The host storage addresses of these task id structures
were specified at <b>Lines:27-28</b> of the <b>host program</b>.

The function "get" shown here is a generic place holder for the platform
specific function to do the memory transfer. Please refer to your platform
specific API to learn how to do the memory transfer from host storage to MPU
storage on your specific platform.
</td></tr>

<tr><td><b>Lines:12-13</b></td><td>
Initialize the task args structure with a unique value. Schedule the first sub
task instance using the task id obtained at <b>Line:9</b>. Pass in the task
args and priority of 0.
</td></tr>

<tr><td><b>Lines:15-16</b></td><td>
Initialize the task args structure with a unique value. Schedule the second sub
task instance using the task id obtained at <b>Line:10</b>. Pass in the task
args and priority of 0.
</td></tr>

<tr><td><b>Lines:18-19</b></td><td>
Wait for the completion of both sub tasks. If the first sub task has not
finished execution by the time of the call to \ref mars_task_wait at
<b>Line:18</b>, this main task will enter a wait state and its context will be
switched out. When the first sub task completes execution, this main task will
resume execution and continue on to wait for the second sub task to complete.
Similarly, at the time of the call to \ref mars_task_wait at <b>Line:19</b>, if
the second sub task has not yet completed it will enter a wait state once again
until completion of the second sub task.
</td></tr>

</table>

\n
<b>(task 2 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		printf("MPU(%d): %s - Hello! (%d)\n",
 7			mars_task_get_kernel_id(), mars_task_get_name(),
 8			task_args->type.u32[0]);
 9
10		return 0;
11	}
\endcode

<table border="1">

<tr><td><b>Line:4</b></td><td>
Since the task args were passed into \ref mars_task_schedule at <b>Line:13</b>
and <b>Line:16</b> of the main <b>task 1 program</b>, task_args is pointing to
an initialized \ref mars_task_args structure. This structure contains the unique
value specified by the main <b>task 1 program</b>.
</td></tr>

<tr><td><b>Lines:6-8</b></td><td>
Print out message to stdout. Print out the unique value specified by the main
<b>task 1 program</b>. This value should be unique for each sub task program.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_3 9.3 Task Barrier Usage

This tutorial will explain how to use the MARS task barrier to synchronize
execution between multiple MARS tasks.

The sample code creates a task barrier and 10 task instances of a task program.
Each task program must do several iterations of some pre-processing work and
some post-processing work. For each iteration, all tasks must complete the
pre-processing work before any tasks can continue to do the post-processing
work. In order to synchronize the tasks to accomplish this, a task barrier will
be used. After finishing the pre-processing and before starting the
post-processing work, the tasks will notify arrival to the barrier. Once all
tasks notify the barrier and the barrier is released, all tasks can proceed to
finish the post-processing work.

\n
<b>(host program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	#define NUM_TASKS	10
 5
 6	static void *task_program_elf_image;
 7
 8	int main(void)
 9	{
10		struct mars_context *mars_ctx;
11		struct mars_task_id task_id[NUM_TASKS];
12		struct mars_task_args task_args;
13		uint64_t barrier_ea;
14		int i;
15
16		mars_context_create(&mars_ctx, 0, 0);
17
18		mars_task_barrier_create(mars_ctx, &barrier_ea, NUM_TASKS);
19
20		for (i = 0; i < NUM_TASKS; i++) {
21			char name[16];
22			sprintf(name, "Task %d", i);
23
24			mars_task_create(mars_ctx, &task_id[i], name, task_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
25
26			task_args.type.u64[0] = barrier_ea;
27			mars_task_schedule(&task_id[i], &task_args, 0);
28		}
29
30		for (i = 0; i < NUM_TASKS; i++) {
31			mars_task_wait(&task_id[i], NULL);
32			mars_task_destroy(&task_id[i]);
33		}
34
35		mars_task_barrier_destroy(barrier_ea);
36
37		mars_context_destroy(mars_ctx);
38
39		return 0;
40	}
\endcode

<table border="1">

<tr><td><b>Line:11</b></td><td>
Declare an array of 10 task ids for each instance of the task program we plan to
create and schedule.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Declare an instance of the task barrier ea.
</td></tr>

<tr><td><b>Line:18</b></td><td>
Create the task barrier instance.

int \ref mars_task_barrier_create (

<b>arg1</b>:
Pass in the MARS context pointer.

<b>arg2</b>:
Pass in the address to the barrier ea we declared at <b>Line:13</b>.

<b>arg3</b>:
Pass in the total number of task notifications to wait for before the barrier is
released.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

</td></tr>

<tr><td><b>Line:24</b></td><td>
Create each of the 10 task instances.
Specify the task program ELF image for these task instances and a context save
area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow these tasks to context
switch.
</td></tr>

<tr><td><b>Lines:26-27</b></td><td>
Initialize the task args we want passed into the task program's \ref
mars_task_main function. Store the barrier ea in the task args.
Schedule the task instance for execution, passing in the task args.
</td></tr>

<tr><td><b>Lines:30-37</b></td><td>
Wait for completion and destroy all 10 task instances. Finally destroy the
barrier instance and the MARS context.
</td></tr>

</table>

\n
<b>(task program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	#define ITERATIONS 3
 5
 6	int mars_task_main(const struct mars_task_args *task_args)
 7	{
 8		int i;
 9		uint64_t barrier_ea = task_args->type.u64[0];
10
11		for (i = 0; i < ITERATIONS; i++) {
12			pre_barrier_process();
13
14			mars_task_barrier_notify(barrier_ea);
15			mars_task_barrier_wait(barrier_ea);
16
17			post_barrier_process();
18		}
19
20		return 0;
21	}
\endcode

<table border="1">

<tr><td><b>Line:6</b></td><td>
Since the task args were passed into \ref mars_task_schedule at <b>Line:27</b>
of the <b>host program</b>, task_args is pointing to an initialized \ref
mars_task_args structure.
</td></tr>

<tr><td><b>Line:9</b></td><td>
Grab the ea of the barrier initialized in the <b>host program</b> from the task
arg structure.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Do several iterations of processing with the task. Each iteration of processing
will be synchronized by the barrier.
</td></tr>

<tr><td><b>Line:12</b></td><td>
Do some pre barrier processing. For this sample assume it processes some dummy
work.
</td></tr>

<tr><td><b>Line:14</b></td><td>
Notify the barrier that we have arrived at the synchronization point.

int \ref mars_task_barrier_notify (

<b>arg1</b>:
Pass in the ea of the barrier.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

<tr><td><b>Line:15</b></td><td>
Wait for the barrier to be released.

int \ref mars_task_barrier_wait (

<b>arg1</b>:
Pass in the ea of the barrier.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

If the barrier has not been released by the time of this call, this means the
other tasks have not yet finished the pre barrier processing and notified the
barrier yet. If this is the case, this task will enter a wait state and its
context will be switched out. When all tasks notify the barrier and the barrier
is released, this task will resume execution and continue.

<tr><td><b>Line:17</b></td><td>
Do some post barrier processing. For this sample assume it processes some dummy
work.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_4 9.4 Task Event Flag Usage

This tutorial will explain how to use the MARS task event flag to synchronize
execution between the host program and MARS tasks.

This sample code creates 2 task instances for task 1 program and task 2
program and creates 3 event flags.

The first event flag is used to synchronize between the host program and task 1.
Task 1 can only begin processing after the host program has waited 1 second and
sets the event flag for task 1 to begin.

The second event flag is used to synchronize between the 2 tasks. Task 2 can
only begin processing after task 1 has completed its processing and sets the
event flag for task 2 to begin.

The third event flag is used to synchronize between task 2 and the host program.
The host program waits until task 2 has completed its processing and sets the
event flag for the host program to continue and finish execution.

\n
<b>(host program)</b>
\code
 1	#include <unistd.h>
 2	#include <mars/task.h>
 3
 4	static void *task1_program_elf_image;
 5	static void *task2_program_elf_image;
 6
 7	int main(void)
 8	{
 9		struct mars_context *mars_ctx;
10		struct mars_task_id task1_id;
11		struct mars_task_id task2_id;
12		struct mars_task_args task_args;
13		uint64_t host_to_mpu_ea;
14		uint64_t mpu_to_host_ea;
15		uint64_t mpu_to_mpu_ea;
16
17		mars_context_create(&mars_ctx, 0, 0);
18
19		mars_task_event_flag_create(mars_ctx, &host_to_mpu_ea,
20						MARS_TASK_EVENT_FLAG_HOST_TO_MPU,
21						MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
22
23		mars_task_event_flag_create(mars_ctx, &mpu_to_host_ea,
24						MARS_TASK_EVENT_FLAG_MPU_TO_HOST,
25						MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
26
27		mars_task_event_flag_create(mars_ctx, &mpu_to_mpu_ea,
28						MARS_TASK_EVENT_FLAG_MPU_TO_MPU,
29						MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
30
31		mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
32		mars_task_create(mars_ctx, &task2_id, "Task 2", task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
33
34		task_args.type.u64[0] = host_to_mpu_ea;
35		task_args.type.u64[1] = mpu_to_mpu_ea;
36		mars_task_schedule(&task1_id, &task_args, 0);
37
38		task_args.type.u64[0] = mpu_to_mpu_ea;
39		task_args.type.u64[1] = mpu_to_host_ea;
40		mars_task_schedule(&task2_id, &task_args, 0);
41
42		sleep(1);
43
44		mars_task_event_flag_set(host_to_mpu_ea, 0x1);
45		mars_task_event_flag_wait(mpu_to_host_ea, 0x1, MARS_TASK_EVENT_FLAG_MASK_AND, NULL);
46
47		mars_task_wait(&task1_id, NULL);
48		mars_task_wait(&task2_id, NULL);
49
50		mars_task_destroy(&task1_id);
51		mars_task_destroy(&task2_id);
52
53		mars_task_event_flag_destroy(host_to_mpu_ea);
54		mars_task_event_flag_destroy(mpu_to_host_ea);
55		mars_task_event_flag_destroy(mpu_to_mpu_ea);
56
57		mars_context_destroy(mars_ctx);
58
59		return 0;
60	}
\endcode

<table border="1">

<tr><td><b>Lines:13-15</b></td><td>
Declare 3 instances of the task event flag structure we plan to create. 
</td></tr>

<tr><td><b>Lines:19-29</b></td><td>
Create the 3 task event flag instances.

int \ref mars_task_event_flag_create (

<b>arg1</b>:
Pass in the MARS context pointer.

<b>arg2</b>:
Pass in the address of the event flag ea we declared at <b>Lines:13-15</b>.

<b>arg3</b>:
Pass in the direction of events for each instance. The direction must be
MARS_TASK_EVENT_FLAG_HOST_TO_MPU, MARS_TASK_EVENT_FLAG_MPU_TO_HOST, or
MARS_TASK_EVENT_FLAG_MPU_TO_MPU.

<b>arg4</b>:
Pass in the clear mode for each instance. Specify
MARS_TASK_EVENT_FLAG_CLEAR_AUTO so the event flag bit is automatically cleared
when the first task waiting on the event receives the event. To specify not
clearing the event bits automatically so that the event flag bits are set until
some task manually clears it, specify MARS_TASK_EVENT_FLAG_CLEAR_MANUAL.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

The first event flag is created for host program to task program events.
The second event flag is created for task program to host program events.
The third event flag is created for task program to task program events.

</td></tr>

<tr><td><b>Lines:31-32</b></td><td>
Create the task instance for both the task 1 program and task 2 program.
Specify a context save area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow
these tasks to context switch.
</td></tr>

<tr><td><b>Lines:34-36</b></td><td>
Initialize the task args we want passed into task 1 program's \ref
mars_task_main function. Store the event flag ea for both host to mpu and
mpu to mpu communication. These event flags will be used to receive events from
the host program and also to send events to task 2 program.
Schedule the task instance for execution, passing in the task args.
</td></tr>

<tr><td><b>Lines:38-40</b></td><td>
Initialize the task args we want passed into task 2 program's \ref
mars_task_main function. Store the event flag ea for both mpu to mpu and
mpu to host communication. These event flags will be used to receive events from
the task 1 program and also to send events to the host program.
Schedule the task instance for execution, passing in the task args.
</td></tr>

<tr><td><b>Line:42</b></td><td>
Sleep for 1 second before continuing. This allows enough time for the tasks to
be scheduled and begin execution. This is only to demonstrate the task entering
the wait state when waiting for a specific event.
</td></tr>

<tr><td><b>Line:44</b></td><td>
Set the event that task 1 is waiting for to allow task 1 to continue execution.

int \ref mars_task_event_flag_set (

<b>arg1</b>:
Pass in the pointer to the event flag instance we created for host to MPU
communication.

<b>arg2</b>:
Pass in the value specifying which bits to set in the event flag. These bits
are logically OR'ed with the bits already set in the event flag.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)
</td></tr>

<tr><td><b>Line:45</b></td><td>
Wait for an event from task 2 before continuing execution.

int \ref mars_task_event_flag_wait (

<b>arg1</b>:
Pass in the pointer to the task instance we created for MPU to host
communication.

<b>arg2</b>:
Pass in the value specifying which bits to check in the event flag. Specify
MARS_TASK_EVENT_FLAG_MASK_OR to wait for any of the specified bits to be set.
Specify MARS_TASK_EVENT_FLAG_MASK_AND to wait for all of the specified bits to
be set.

<b>arg3</b>:
Pass in NULL because it is not necessary to know the bits status upon returning
from the event wait in this sample.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

If the event flag has not been set by the time of this call, this call will
block until the specific event flag bit is set.
</td></tr>

<tr><td><b>Lines:47-57</b></td><td>
Wait for completion and destroy the 2 task instances and all event flag
instances and finally destroy the MARS context.
</td></tr>

</table>

\n
<b>(task 1 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		uint64_t host_to_mpu_ea = task_args->type.u64[0];
 7		uint64_t mpu_to_mpu_ea = task_args->type.u64[1];
 8
 9		mars_task_event_flag_wait(host_to_mpu_ea, 0x1,
10					MARS_TASK_EVENT_FLAG_MASK_AND, NULL);
11
12		printf("MPU(%d): %s - Hello!\n",
13			mars_task_get_kernel_id(), mars_task_get_name());
14
15		mars_task_event_flag_set(mpu_to_mpu_ea, 0x1);
16
17		return 0;
18	}
\endcode

<table border="1">

<tr><td><b>Line:6</b></td><td>
Grab the ea of the event flag initialized in the <b>host program</b> for host to
MPU communication from the task arg structure.
</td></tr>

<tr><td><b>Line:7</b></td><td>
Grab the ea of the event flag ea initialized in the <b>host program</b> for MPU
to MPU communication from the task arg structure.
</td></tr>

<tr><td><b>Lines:9-10</b></td><td>
Wait for an event from the host program before continuing execution. Make sure
to check for the proper bit set from the host program.

If the event flag has not been set by the time of this call, this task will
enter a wait state and its context will be switched out. When the event flag bit
this task is checking for is set, this task will resume execution and continue.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Set the event that task 2 is waiting for to allow task 2 execution to resume.
</td></tr>

</table>

\n
<b>(task 2 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		uint64_t mpu_to_mpu_ea = task_args->type.u64[0];
 7		uint64_t mpu_to_host_ea = task_args->type.u64[1];
 8
 9		mars_task_event_flag_wait(mpu_to_mpu_ea, 0x1,
10					MARS_TASK_EVENT_FLAG_MASK_AND, NULL);
11
12		printf("MPU(%d): %s - Hello!\n",
13			mars_task_get_kernel_id(), mars_task_get_name());
14
15		mars_task_event_flag_set(mpu_to_host_ea, 0x1);
16
17		return 0;
18	}
\endcode

<table border="1">

<tr><td><b>Line:6</b></td><td>
Grab the ea of the event flag initialized in the <b>host program</b> for MPU to
MPU communication from the task arg structure.
</td></tr>

<tr><td><b>Line:7</b></td><td>
Grab the ea of the event flag initialized in the <b>host program</b> for MPU to
host communication from the task arg structure.
</td></tr>

<tr><td><b>Lines:9-10</b></td><td>
Wait for an event from the task 1 program before continuing execution. Make sure
to check for the proper bit set from the task 1 program.

If the event flag has not been set by the time of this call, this task will
enter a wait state and its context will be switched out. When the event flag bit
this task is checking for is set, this task will resume execution and continue.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Set the event that the host program is waiting for to allow the host program
execution to resume.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_5 9.5 Task Queue Usage

This tutorial will explain how to use the MARS task queue to synchronize
execution between the host program and MARS tasks.

This sample code creates multiple task instances for task 1 program and task
2 program and creates 3 queues.

The first queue is created for host to MPU communication, so the host
program can send data to the task 1 program. The second queue is created for
MPU to MPU communication, so the task 1 program can send data to the task 2
program. The third queue is created for MPU to host communication, so the
task 2 program can send data to the host program.

First the host program creates and schedules all task instances for
execution. It then immediately begins pushing data into the host to MPU queue
for task 1 program to process.

The task 1 program instances wait for data to arrive from the host and pop the
data as it arrives. After popping the data, it handles some processing before
pushing data into the MPU to MPU queue for task 2 program to process.

The task 2 program instances wait for data to arrive from the first task program
and pop the data as it arrives. After popping the data, it handles some
processing before pushing data into the MPU to host queue for the host program
to receive the resulting data.

The program is completed when the host pops and receives all result data from
the task 2 program.

\n
<b>(host program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	#define NUM_TASKS	3
 5	#define NUM_ENTRIES	10
 6	#define QUEUE_DEPTH	(NUM_TASKS * NUM_ENTRIES)
 7
 8	struct queue_entry {
 9		char text[64];
10	};
11
12	static void *task1_program_elf_image;
13	static void *task2_program_elf_image;
14
15	int main(void)
16	{
17		struct mars_context *mars_ctx;
18		struct mars_task_id task1_id[NUM_TASKS];
19		struct mars_task_id task2_id[NUM_TASKS];
20		struct mars_task_args task_args;
21		uint64_t host_to_mpu_ea;
22		uint64_t mpu_to_host_ea;
23		uint64_t mpu_to_mpu_ea;
24		struct queue_entry data;
25		int i;
26
27		mars_context_create(&mars_ctx, 0, 0);
28
29		mars_task_queue_create(mars_ctx, &host_to_mpu_ea,
30					sizeof(struct queue_entry), QUEUE_DEPTH,
31					MARS_TASK_QUEUE_HOST_TO_MPU);
32
33		mars_task_queue_create(mars_ctx, &mpu_to_host_ea,
34					sizeof(struct queue_entry), QUEUE_DEPTH,
35					MARS_TASK_QUEUE_MPU_TO_HOST);
36
37		mars_task_queue_create(mars_ctx, &mpu_to_mpu_ea,
38					sizeof(struct queue_entry), QUEUE_DEPTH,
39					MARS_TASK_QUEUE_MPU_TO_MPU);
40
41		for (i = 0; i < NUM_TASKS; i++) {
42			char name[MARS_TASK_NAME_LEN_MAX];
43
44			snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 1.%d", i + 1);
45			mars_task_create(mars_ctx, &task1_id[i], name, task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
46
47			snprintf(name, MARS_TASK_NAME_LEN_MAX, "Task 2.%d", i + 1);
48			mars_task_create(mars_ctx, &task2_id[i], name, task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
49
50			task_args.type.u64[0] = host_to_mpu_ea;
51			task_args.type.u64[1] = mpu_to_mpu_ea;
52			task_args.type.u32[4] = NUM_ENTRIES;
53			mars_task_schedule(&task1_id[i], &task_args, 0);
54
55			task_args.type.u64[0] = mpu_to_mpu_ea;
56			task_args.type.u64[1] = mpu_to_host_ea;
57			task_args.type.u32[4] = NUM_ENTRIES;
58			mars_task_schedule(&task2_id[i], &task_args, 0);
59		}
60
61		for (i = 0; i < QUEUE_DEPTH; i++) {
62			sprintf(data.text, "Host Data %d", i + 1);
63			mars_task_queue_push(host_to_mpu_ea, &data);
64		}
65
66		for (i = 0; i < QUEUE_DEPTH; i++) {
67			mars_task_queue_pop(mpu_to_host_ea, &data);
68			printf("%s\n", data.text);
69		}
70
71		for (i = 0; i < NUM_TASKS; i++) {
72			mars_task_wait(&task1_id[i], NULL);
73			mars_task_wait(&task2_id[i], NULL);
74
75			mars_task_destroy(&task1_id[i]);
76			mars_task_destroy(&task2_id[i]);
77		}
78
79		mars_task_queue_destroy(host_to_mpu_ea);
80		mars_task_queue_destroy(mpu_to_host_ea);
81		mars_task_queue_destroy(mpu_to_mpu_ea);
82
83		mars_context_destroy(mars_ctx);
84
85		return 0;
86	}
\endcode

<table border="1">

<tr><td><b>Lines:8-10</b></td><td>
Define the data entry structure. For this sample this is a 64-byte char array.
</td></tr>

<tr><td><b>Lines:21-23</b></td><td>
Declare 3 instances of the task queue pointer.
</td></tr>

<tr><td><b>Line:24</b></td><td>
Declare a local instance of the task queue data entry structure.
</td></tr>

<tr><td><b>Lines:29-39</b></td><td>
Create the 3 task queue instances.

int \ref mars_task_queue_create (

<b>arg1</b>:
Pass in the MARS context pointer.

<b>arg2</b>:
Pass in the address of the queue ea instance.

<b>arg3</b>:
Pass in size of each queue data entry. The size must be a multiple of 16 and
not greater than MARS_TASK_QUEUE_ENTRY_SIZE_MAX.

<b>arg4</b>:
Pass in depth of queue which is the maximum number of data entries allowed
in the queue at any time.

<b>arg5</b>:
Pass in the direction of queue for each instance. The direction must be
MARS_TASK_QUEUE_HOST_TO_MPU, MARS_TASK_QUEUE_MPU_TO_HOST, or
MARS_TASK_QUEUE_MPU_TO_MPU.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

The first queue is created for host program to task program data passing.
The second queue is created for task program to host program data passing.
The third queue is created for task program to task program data passing.

</td></tr>

<tr><td><b>Lines:44-48</b></td><td>
Create multiple task instances for both the task 1 program and task 2 program.
Specify a context save area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow
these tasks to context switch when there is no data available to be popped from
the queues.
</td></tr>

<tr><td><b>Lines:50-53</b></td><td>
Initialize the task args we want passed into task 1 program's \ref
mars_task_main function. Store the host storage addresses of the queue instances
for both host to mpu and mpu to mpu communication. These queues will be used to
receive data from the host program and also to send data to task 2 program. Also
store the number of data entries task 1 program should expect to process.
Schedule the task instance for execution, passing in the task args.
</td></tr>

<tr><td><b>Lines:55-58</b></td><td>
Initialize the task args we want passed into task 2 program's \ref
mars_task_main function. Store the host storage addresses of the queue instances
for both mpu to mpu and mpu to host communication. These queues will be used to
receive data from the task 1 program and also to send data to the host program.
Also store the number of data entries task 2 program should expect to process.
Schedule the task instance for execution, passing in the task args.
</td></tr>

<tr><td><b>Lines:61-64</b></td><td>
Loop to push data into the queue for task 1 program to receive. The data is a
string identifying the data id number.

Initialize the data queue entry structure with some string identifying the data
id.

int \ref mars_task_queue_push (

<b>arg1</b>:
Pass in the ea of the task queue instance when we created for host to MPU
communication.

<b>arg2</b>:
Pass in the pointer to the data queue entry instance we initialized.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)
</td></tr>

<tr><td><b>Lines:66-69</b></td><td>
Loop to pop data from the queue that task 2 program populates with the final
result data.

int \ref mars_task_queue_pop (

<b>arg1</b>:
Pass in the ea of the task queue instance we created for host to MPU communication.

<b>arg2</b>:
Pass in the pointer to the data queue entry to store the data from the queue.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

On <b>Line:68</b> print the resulting data that has been processed by task 1
program and task 2 program. The final data should be a string identifying the
processing path of the data from host program, to task 1 program, to task 2
program.
</td></tr>

<tr><td><b>Lines:71-83</b></td><td>
Wait for task completion and destroy all task instances and queue instances
and finally destroy the MARS context.
</td></tr>

</table>

\n
<b>(task 1 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	struct queue_entry {
 5		char text[64];
 6	};
 7
 8	int mars_task_main(const struct mars_task_args *task_args)
 9	{
10		int i;
11		uint64_t host_to_mpu_ea = task_args->type.u64[0];
12		uint64_t mpu_to_mpu_ea = task_args->type.u64[1];
13		uint32_t num_entries = task_args->type.u32[4];
14		struct queue_entry data;
15
16		for (i = 0; i < num_entries; i++) {
17			mars_task_queue_pop(host_to_mpu_ea, &data);
18
19			sprintf(&data.text[strlen(data.text)], " -> %s Data %d",
20				mars_task_get_name(), i + 1);
21
22			mars_task_queue_push(mpu_to_mpu_ea, &data);
23		}
24
25		return 0;
26	}
\endcode

<table border="1">

<tr><td><b>Lines:4-6</b></td><td>
Define the data entry structure. For this sample this is a 64-byte char array.
This is a redefinition of <b>host program Lines:8-10</b>.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Grab the ea of the queue created in the <b>host program</b> for host to
MPU communication from the task arg structure.
</td></tr>

<tr><td><b>Line:12</b></td><td>
Grab the ea of the queue created in the <b>host program</b> for MPU to
MPU communication from the task arg structure.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Grab the number of entries this task needs to pop from the queue and process.
</td></tr>

<tr><td><b>Line:14</b></td><td>
Declare a local instance of the task queue data entry structure.
</td></tr>

<tr><td><b>Line:16</b></td><td>
Loop the number of data entries this task needs to processed specified by the
task_args specified at <b>Line:12</b>.

<tr><td><b>Line:17</b></td><td>
Pop data from the queue being sent from the <b>host program</b> to be processed.

If the queue is empty by the time of this call, this task will enter a wait
state and its context will be switched out. When the <b>host program</b> pushes
new data into the queue and is available to be popped by this task, this task
will resume execution and continue.
</td></tr>

<tr><td><b>Lines:19-20</b></td><td>
Take the data string received from the <b>host program</b> and append a string
identifier for this task.
</td></tr>

<tr><td><b>Line:22</b></td><td>
Push the processed data into the queue for the <b>task 2 program</b> to receive.
</td></tr>

</table>

\n
<b>(task 2 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	struct queue_entry {
 5		char text[64];
 6	};
 7
 8	int mars_task_main(const struct mars_task_args *task_args)
 9	{
10		int i;
11		uint64_t mpu_to_mpu_ea = task_args->type.u64[0];
12		uint64_t mpu_to_host_ea = task_args->type.u64[1];
13		uint32_t num_entries = task_args->type.u32[4];
14		struct queue_entry data;
15
16		for (i = 0; i < num_entries; i++) {
17			mars_task_queue_pop(mpu_to_mpu_ea, &data);
18
19			sprintf(&data.text[strlen(data.text)], " -> %s Data %d",
20				mars_task_get_name(), i + 1);
21
22			mars_task_queue_push(mpu_to_host_ea, &data);
23		}
24
25		return 0;
26	}
\endcode

<table border="1">

<tr><td><b>Lines:4-6</b></td><td>
Define the data entry structure. For this sample this is a 64-byte char array.
This is a redefinition of <b>host program Lines:8-10</b>.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Grab the ea of the queue created in the <b>host program</b> for MPU to MPU
communication from the task arg structure.
</td></tr>

<tr><td><b>Line:12</b></td><td>
Grab the ea of the queue created in the <b>host program</b> for MPU to host
communication from the task arg structure.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Grab the number of entries this task needs to pop from the queue and process.
</td></tr>

<tr><td><b>Line:14</b></td><td>
Declare a local instance of the task queue data entry structure.
</td></tr>

<tr><td><b>Line:16</b></td><td>
Loop the number of data entries this task needs to processed specified by the
task_args specified at <b>Line:12</b>.

<tr><td><b>Line:17</b></td><td>
Pop data from the queue being sent from the <b>task 1 program</b> to be
processed.

If the queue is empty by the time of this call, this task will enter a wait
state and its context will be switched out. When the <b>task 1 program</b>
pushes new data into the queue and is available to be popped by this task, this
task will resume execution and continue.
</td></tr>

<tr><td><b>Lines:19-20</b></td><td>
Take the data string received from the <b>task 1 program</b> and append a string
identifier for this task.
</td></tr>

<tr><td><b>Line:22</b></td><td>
Push the processed data into the queue for the <b>host program</b> to receive.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_6 9.6 Task Semaphore Usage

This tutorial will explain how to use the MARS task semaphore to synchronize
modification of a shared resource to avoid having it accessed by multiple
MARS tasks at once.

This sample code creates 10 task instances of the same task program and
creates a single semaphore to protect access of a shared resource integer
counter located in main storage. As each task runs, it tries to obtain the
semaphore and increments the shared resource counter before releasing the
semaphore. Since the shared resource is protected from concurrent accesses, the
resulting value of the counter should equal to the number of total tasks, %d,
when the program has completed.

\n
<b>(host program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	#define NUM_TASKS	10
 5
 6	static void *task_program_elf_image;
 7
 8	int main(void)
 9	{
10		struct mars_context *mars_ctx;
11		struct mars_task_id task_id[NUM_TASKS];
12		struct mars_task_args task_args;
13		uint64_t semaphore_ea;
14		uint32_t shared_resource __attribute__((aligned(16)));
15		int i;
16
17		mars_context_create(&mars_ctx, 0, 0);
18
19		mars_task_semaphore_create(mars_ctx, &semaphore_ea, 1);
20
21		shared_resource = 0;
22
23		printf("HOST  : Main() - Shared Resource Counter = %d\n", shared_resource);
24
25		for (i = 0; i < NUM_TASKS; i++) {
26			char name[16];
27			sprintf(name, "Task %d", i);
28
29			mars_task_create(mars_ctx, &task_id[i], name, task_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
30
31			task_args.type.u64[0] = semaphore_ea;
32			task_args.type.u64[1] = mars_ptr_to_ea(&shared_resource);
33			mars_task_schedule(&task_id[i], &task_args, 0);
34		}
35
36		for (i = 0; i < NUM_TASKS; i++) {
37			mars_task_wait(&task_id[i], NULL);
38			mars_task_destroy(&task_id[i]);
39		}
40
41		printf("HOST  : Main() - Shared Resource Counter = %d\n", shared_resource);
42
43		mars_task_semaphore_destroy(semaphore);
44
45		mars_context_destroy(mars_ctx);
46
47		return 0;
48	}
\endcode

<table border="1">

<tr><td><b>Line:11</b></td><td>
Declare an array of 10 task ids for each instance of task 1 program we plan to
create and schedule.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Declare an instance of the task semaphore ea.
</td></tr>

<tr><td><b>Line:14</b></td><td>
Declare an instance of a shared resource counter we plan to modify from various
tasks.
</td></tr>

<tr><td><b>Line:19</b></td><td>
Create the task semaphore instance.

int \ref mars_task_semaphore_create (

<b>arg1</b>:
Pass in the MARS context pointer.

<b>arg2</b>:
Pass in the address of the sempahore ea.

<b>arg3</b>:
Pass in the total number of simultaneous task accesses allowed.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)
</td></tr>

<tr><td><b>Line:21</b></td><td>
Initialize the shared resource counter to 0.
</td></tr>

<tr><td><b>Line:23</b></td><td>
Print the current value of the shared resource counter to stdout.
</td></tr>

<tr><td><b>Line:29</b></td><td>
Create 10 task instances of task 1 program.
Specify a context save area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow
these tasks to context switch.
</td></tr>

<tr><td><b>Lines:31-33</b></td><td>
Initialize the task args we want passed into task 1 program's \ref
mars_task_main function. Store the ea of the semaphore instance.
Also store the host storage address of the shared resource instance, so that
each task can modify it.
</td></tr>

<tr><td><b>Lines:36-39</b></td><td>
Wait for completion and destroy all task instances.
</td></tr>

<tr><td><b>Lines:41</b></td><td>
Print the current value of the shared resource counter to stdout. Since each
one of 10 tasks should have incremented the shared resource counter one time
with no simultaneous access allowed, the resulting shared resource counter
should equal the number of tasks of 10.
</td></tr>

<tr><td><b>Lines:43-45</b></td><td>
Destroy the semaphore and MARS context.
</td></tr>

</table>

\n
<b>(task program)</b>
\code
 1	#include <mars/task.h>
 2
 3	int mars_task_main(const struct mars_task_args *task_args)
 4	{
 5		uint64_t semaphore_ea = task_args->type.u64[0];
 6		uint64_t shared_resource_ea = task_args->type.u64[1];
 7		uint32_t shared_resource __attribute__((aligned(16)));
 8
 9		mars_task_semaphore_acquire(semaphore_ea);
10
11		get(&shared_resource, shared_resource_ea, sizeof(uint32_t));
12
13		shared_resource++;
14
15		put(&shared_resource, shared_resource_ea, sizeof(uint32_t));
16
17		mars_task_semaphore_release(semaphore_ea);
18
19		return 0;
20	}
\endcode

<table border="1">

<tr><td><b>Line:5</b></td><td>
Grab the ea of the semaphore initialized in the <b>host program</b> from the
task arg structure.
</td></tr>

<tr><td><b>Line:6</b></td><td>
Grab the ea of the shared resource counter declared in the <b>host program</b>.
</td></tr>

<tr><td><b>Line:7</b></td><td>
Declare a local instance of the shared resource counter.
</td></tr>

<tr><td><b>Line:9</b></td><td>
Attempt to acquire access to the semaphore.

int \ref mars_task_semaphore_acquire (

<b>arg1</b>:
Pass in the ea of the sempahore instance initialized at <b>Line:5</b>.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

If the semaphore cannot be acquired at the time of this call, this task will
enter a wait state and its context will be switched out. When the semaphore is
released by another task and available for this task to acquire, this task will
resume execution and continue.
</td></tr>

<tr><td><b>Line:11</b></td><td>
Memory transfer from host storage to MPU storage the shared resource counter
instance.

The function "get" shown here is a generic place holder for the
platform specific function to do the memory transfer. Please refer to your
platform specific API to learn how to do the memory transfer from host storage
to MPU storage on your specific platform.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Increment the shared resource counter. Since the shared resource is proteced by
the semaphore, it is guaranteed that no other tasks have access to the same
shared resource during the time this task holds the semaphore.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Memory transfer from MPU storage to host storage the modified shared resource
counter instance.

The function "put" shown here is a generic place holder for
the platform specific function to do the memory transfer. Please refer to your
platform specific API to learn how to do the memory transfer from MPU storage to
host storage on your specific platform.
</td></tr>

<tr><td><b>Line:17</b></td><td>
Release the access to the semaphore.

int \ref mars_task_semaphore_release (

<b>arg1</b>:
Pass in the ea of the sempahore instance.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

</table>

\n
********************************************************************************
\section sec_9_7 9.7 Task Signal Usage

This tutorial will explain how to use the MARS task signal to synchronize
between the host program and multiple MARS tasks.

This sample code creates 2 separate task instances. Task 1 can only begin
processing after the host program has waited 1 second and signals for task 1 to
begin. Task 2 can only begin after task 1 has completed its processing and
signals for task 2 to begin. Task 1 must also wait to receive a signal back from
task 2 notifying that it has finished processing before it itself can finish
execution. The host program waits for completion of both tasks before finishing.

\n
<b>(host program)</b>
\code
 1	#include <unistd.h>
 2	#include <mars/task.h>
 3
 4	static void *task1_program_elf_image;
 5	static void *task2_program_elf_image;
 6
 7	int main(void)
 8	{
 9		struct mars_context *mars_ctx;
10		struct mars_task_id task1_id;
11		struct mars_task_id task2_id;
12		struct mars_task_args task_args;
13
14		mars_context_create(&mars_ctx, 0, 0);
15
16		mars_task_create(mars_ctx, &task1_id, "Task 1", task1_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
17		mars_task_create(mars_ctx, &task2_id, "Task 2", task2_program_elf_image, MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
18
19		task_args.type.u64[0] = mars_ptr_to_ea(&task2_id);
20		mars_task_schedule(&task1_id, &task_args, 0);
21
22		task_args.type.u64[0] = mars_ptr_to_ea(&task1_id);
23		mars_task_schedule(&task2_id, &task_args, 0);
24
25		sleep(1);
26
27		mars_task_signal_send(&task1_id);
28
29		mars_task_wait(&task1_id, NULL);
30		mars_task_wait(&task2_id, NULL);
31
32		mars_task_destroy(&task1_id);
33		mars_task_destroy(&task2_id);
34
35		mars_context_destroy(mars_ctx);
36
37		return 0;
38	}
\endcode

<table border="1">

<tr><td><b>Lines:16-17</b></td><td>
Create task instances for task 1 program and task 2 program each with
context save areas.
</td></tr>

<tr><td><b>Lines:19-20</b></td><td>
Initialize the task args we want passed into task 1 program's \ref
mars_task_main function. Store the host storage address of the task id
structure of task 2. Schedule task 1 for execution.
</td></tr>

<tr><td><b>Lines:22-23</b></td><td>
Initialize the task args we want passed into task 2 program's \ref
mars_task_main function. Store the host storage address of the task id
structure of task 1. Schedule task 2 for execution.
</td></tr>

<tr><td><b>Line:25</b></td><td>
Sleep for 1 second before continuing. This allows enough time for the tasks to
be scheduled and begin execution. This is only to demonstrate the task entering
the wait state when waiting for a signal.
</td></tr>

<tr><td><b>Line:34</b></td><td>
Send a signal to task 1 that is waiting for a signal to allow it to continue
execution.

int \ref mars_task_signal_send (

<b>arg1</b>:
Pass in the pointer to the task id instance of the created task we want to
signal.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)
</td></tr>

<tr><td><b>Lines:29-35</b></td><td>
Wait for completion and destroy the 2 task instances and finally finalize the
MARS context.
</td></tr>

</table>

\n
<b>(task 1 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		struct mars_task_id task2_id;
 7
 8		get(&task2_id, task_args->type.u64[0], sizeof(struct mars_task_id));
 9
10		mars_task_signal_wait();
11
12		printf("MPU(%d): %s - Hello!\n",
13			mars_task_get_kernel_id(), mars_task_get_name());
14
15		mars_task_signal_send(&task2_id);
16
17		mars_task_signal_wait();
18
19		return 0;
20	}
\endcode

<table border="1">

<tr><td><b>Line:6</b></td><td>
Declare a local task id instance to store the task 2's id.
</td></tr>

<tr><td><b>Line:8</b></td><td>
Memory transfer from host storage to MPU storage the task id instance of task
2. The host storage address of the task id for task 2 is obtained from the
task_args passed in from the <b>host program</b>.

The function "get" shown here is a generic place holder for the
platform specific function to do the memory transfer. Please refer to your
platform specific API to learn how to do the memory transfer from host storage
to MPU storage on your specific platform.
</td></tr>

<tr><td><b>Line:10</b></td><td>
Wait for a signal from the <b>host program</b> before continuing execution.

int \ref mars_task_signal_wait (

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)

If a signal has not been set by the time of this call, this task will enter a
wait state and its context will be switched out. When the task receives a
signal, this task will resume execution and continue.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Send a signal to task 2 that is waiting for a signal to allow task 2 execution
to resume.

int \ref mars_task_signal_send (

<b>arg1:</b>:
Address of the local task id instance of task 2.

<b>return</b>:
MARS_SUCCESS is returned on success and a negative error value otherwise.

)
</td></tr>

<tr><td><b>Lines:9-10</b></td><td>
Wait for a signal from the <b>task 2 program</b> before continuing
execution.
</td></tr>

</table>

\n
<b>(task 2 program)</b>
\code
 1	#include <stdio.h>
 2	#include <mars/task.h>
 3
 4	int mars_task_main(const struct mars_task_args *task_args)
 5	{
 6		struct mars_task_id task1_id;
 7
 8		get(&task1_id, task_args->type.u64[0], sizeof(struct mars_task_id));
 9
10		mars_task_signal_wait();
11
12		printf("MPU(%d): %s - Hello!\n",
13			mars_task_get_kernel_id(), mars_task_get_name());
14
15		mars_task_signal_send(&task1_id);
16
17		return 0;
18	}
\endcode

<table border="1">

<tr><td><b>Line:6</b></td><td>
Declare a local task id instance to store the task 1's id.
</td></tr>

<tr><td><b>Line:8</b></td><td>
Memory transfer from host storage to MPU storage the task id instance of task
1. The host storage address of the task id for task 1 is obtained from the
task_args passed in from the <b>host program</b>.

The function "get" shown here is a generic place holder for the
platform specific function to do the memory transfer. Please refer to your
platform specific API to learn how to do the memory transfer from host storage
to MPU storage on your specific platform.
</td></tr>

<tr><td><b>Line:10</b></td><td>
Wait for a signal from the <b>host program</b> before continuing execution.

If a signal has not been set by the time of this call, this task will enter a
wait state and its context will be switched out. When the task receives a
signal, this task will resume execution and continue.
</td></tr>

<tr><td><b>Line:15</b></td><td>
Send a signal to task 1 that is waiting for a signal to allow task 1 execution
to resume.
</td></tr>

</table>

\n
********************************************************************************
\section sec_9_8 9.8 Task Grayscale Program

This tutorial is a detailed explanation of a program that uses the MARS tasks to
process grayscale conversion of an input image.

In this program, the data partitioning process of the input image is handled by
the MARS main <b>task 1 program</b>, and the actual grayscale conversion process
is handled by multiple instances of the MARS sub <b>task 2 program</b>. As a
result, the major processes of grayscale conversion processing can be executed
all on the MPUs. The following describes detailed processing of each program.

The <b>host program</b> is executed as follows:

1. Create a MARS context.\n
2. Create both the main and sub tasks.\n
3. Create a task queue and task event flag to be used for communication between
the tasks.\n
4. Schedule the main task for execution.\n
5. Wait for completion of the main task.\n
6. Destroy the tasks, synchronization objects and MARS context.\n

To create the main task, the <b>host program</b> passes the following
information to the main task:

1. parameters for grayscale conversion processing (effective addresses and
number of pixels of input/output buffers)\n
2. host addresses of the created sub task ids\n
3. host addresses of synchronization objects (queue and event flag) to be
used for communication between the tasks\n

The main <b>task 1 program</b> is executed as follows:

1. Retrieves data to be passed from <b>host program</b> to sub tasks.\n
2. Schedule instances of sub tasks for execution.\n
3. Partition grayscale conversion processing.\n
4. Insert parameters for partitioned processing to task queues.\n
5. Wait for completion of sub tasks using task event flag.\n

Only the host address of the task queue is passed from the main task to the sub
tasks through the task arguments. Other information is passed to the sub tasks
through the task queue.

The main task and sub tasks pass the following parameters for the partitioned
grayscale conversion processing through the task queue:

1. host addresses of task event flag\n
2. host addresses of partitioned input data\n
3. host addresses of partitioned output data\n
4. number of pixels of partitioned input/output data\n
5. identification numbers to be used for sending completion notification of
partitioned data\n

Finally, the sub <b>task 2 program</b> instances are executed as follows.

1. Get parameters for processing partitioned by main task from the task queue.\n
2. Execute grayscale conversion processing.\n
3. Send completion notification to main task using task event flag.\n

By using MARS, the MPUs can perform all the processing necessary except for the
initialization of the MARS execution environment, and efficient applications for
MPU-centric program execution and control can be created.

In this tutorial program, MARS instances are processed in the function rgb2y()
in the <b>host program</b> so that readers can easily understand the program.
However, this method is not generally recommended because if the function is
frequently called (such as when multiple images are processed in an
application), the MARS isntances are initialized every time the function is
called and becomes very inefficient. Ideally, programs should be designed so
that MARS instances are needed to be initialized only once.

\note This tutorial program is written specifically for the Cell B.E.
processor. Implementations for other multicore architectures may differ.

\n
<b>(host program)</b>
\code
  1	#include <stdio.h>
  2	#include <stdlib.h>
  3	#include <string.h>
  4	#include <malloc.h>
  5	#include <sys/stat.h>
  6	#include <libspe2.h>
  7	#include <mars/task.h>
  8
  9	#define IN_FILENAME	"in.ppm"
 10	#define OUT_FILENAME	"out.ppm"
 11	#define PPM_MAGIC	"P6"
 12
 13	#define NUM_TASKS	4
 14	#define QUEUE_DEPTH	4
 15
 16	typedef struct _image_t {
 17		int width;
 18		int height;
 19		unsigned char *src;
 20		unsigned char *dst;
 21	} image_t;
 22
 23	typedef struct {
 24		uint64_t ea_task_id;
 25		uint64_t ea_event;
 26		uint64_t ea_queue;
 27		uint64_t ea_src;
 28		uint64_t ea_dst;
 29		uint32_t num;
 30		uint32_t pad;
 31	} grayscale_params_t;
 32
 33	typedef struct {
 34		uint64_t ea_event;
 35		uint64_t ea_src;
 36		uint64_t ea_dst;
 37		uint32_t num;
 38		uint32_t id;
 39	} grayscale_queue_elem_t;
 40
 41	extern struct spe_program_handle task1_spe_prog;
 42	extern struct spe_program_handle task2_spe_prog;
 43
 44	static struct mars_context *mars_ctx;
 45	static struct mars_task_id task1_id;
 46	static struct mars_task_id task2_id[NUM_TASKS];
 47	static struct mars_task_args task_args;
 48	static uint64_t ea_event;
 49	static uint64_t ea_queue;
 50
 51	static grayscale_params_t grayscale_params __attribute__((aligned(16)));
 52
 53	/* initialize MARS execution environment for rgb2y processing */
 54	void rgb2y(unsigned char *src, unsigned char *dst, int num)
 55	{
 56		int ret, i;
 57
 58		ret = mars_context_create(&mars_ctx, 0, 0);
 59		if (ret) {
 60			printf("Could not create MARS context! (%d)\n", ret);
 61			exit(1);
 62		}
 63
 64		ret = mars_task_event_flag_create(mars_ctx, &ea_event,
 65						MARS_TASK_EVENT_FLAG_MPU_TO_MPU,
 66						MARS_TASK_EVENT_FLAG_CLEAR_AUTO);
 67		if (ret) {
 68			printf("Could not create MARS task event flag! (%d)\n", ret);
 69			exit(1);
 70		}
 71
 72		ret = mars_task_queue_create(mars_ctx, &ea_queue,
 73						sizeof(grayscale_queue_elem_t),
 74						QUEUE_DEPTH,
 75						MARS_TASK_QUEUE_MPU_TO_MPU);
 76		if (ret) {
 77			printf("Could not create MARS task queue! (%d)\n", ret);
 78			exit(1);
 79		}
 80
 81		ret = mars_task_create(mars_ctx, &task1_id,
 82					"Grayscale Main Task",
 83					task1_spe_prog.elf_image,
 84					MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
 85		if (ret) {
 86			printf("Could not create MARS main task! (%d)\n", ret);
 87			exit(1);
 88		}
 89
 90		for (i = 0; i < NUM_TASKS; i++) {
 91			ret = mars_task_create(mars_ctx, &task2_id[i],
 92						"Grayscale Sub Task",
 93						task2_spe_prog.elf_image,
 94						MARS_TASK_CONTEXT_SAVE_SIZE_MAX);
 95			if (ret) {
 96				printf("Could not create MARS sub task! (%d)\n", ret);
 97				exit(1);
 98			}
 99		}
100
101		/* initialize grayscale params */
102		grayscale_params.ea_task_id = mars_ptr_to_ea(&task2_id);
103		grayscale_params.ea_event   = ea_event;
104		grayscale_params.ea_queue   = ea_queue;
105		grayscale_params.ea_src     = mars_ptr_to_ea(src);
106		grayscale_params.ea_dst     = mars_ptr_to_ea(dst);
107		grayscale_params.num        = num;
108
109		/* initialize task args */
110		task_args.type.u64[0] = mars_ptr_to_ea(&grayscale_params);
111
112		ret = mars_task_schedule(&task1_id, &task_args, 0);
113		if (ret) {
114			printf("Could not schedule MARS main task! (%d)\n", ret);
115			exit(1);
116		}
117
118		ret = mars_task_wait(&task1_id, NULL);
119		if (ret) {
120			printf("Could not wait for MARS main task! (%d)\n", ret);
121			exit(1);
122		}
123
124		ret = mars_task_destroy(&task1_id);
125		if (ret) {
126			printf("Could not destroy MARS main task! (%d)\n", ret);
127			exit(1);
128		}
129
130		for (i = 0; i < NUM_TASKS; i++) {
131			ret = mars_task_destroy(&task2_id[i]);
132			if (ret) {
133				printf("Could not destroy MARS sub task! (%d)\n", ret);
134				exit(1);
135			}
136		}
137
138		ret = mars_context_destroy(mars_ctx);
139		if (ret) {
140			printf("Could not destroy MARS context! (%d)\n", ret);
141			exit(1);
142		}
143	}
144
145	/* read ppm data from input file */
146	void read_ppm(image_t *img, char *fname)
147	{
148		char *token, *pc, *buf, *del = " \t\n";
149		int i, w, h, luma, pixs, filesize;
150		struct stat st;
151		unsigned char *dot;
152		FILE *fp;
153
154		/* read raw data */
155		stat(fname, &st);
156		filesize = (int) st.st_size;
157		buf = (char *) malloc(filesize * sizeof(char));
158
159		if ((fp = fopen(fname, "r")) == NULL) {
160			fprintf(stderr, "error: failed to open file %s\n", fname);
161			exit(1);
162		}
163
164		fseek(fp, 0, SEEK_SET);
165		fread(buf, filesize * sizeof(char), 1, fp);
166		fclose(fp);
167
168		/* validate file format */
169		token = (char *) (unsigned long) strtok(buf, del);
170		if (strncmp(token, PPM_MAGIC, 2) != 0) {
171			fprintf(stderr, "error: invalid file format\n");
172			exit(1);
173		}
174
175		/* skip comments */
176		token = (char *) (unsigned long) strtok(NULL, del);
177		if (token[0] == '#') {
178			token = (char *) (unsigned long) strtok(NULL, "\n");
179			token = (char *) (unsigned long) strtok(NULL, del);
180		}
181
182		/* read picture size (and luma) */
183		w = strtoul(token, &pc, 10);
184		token = (char *) (unsigned long) strtok(NULL, del);
185		h = strtoul(token, &pc, 10);
186		token = (char *) (unsigned long) strtok(NULL, del);
187		luma = strtoul(token, &pc, 10);
188
189		img->width = w;
190		img->height = h;
191
192		/* allocate an aligned memory */
193		pixs = w * h;
194		img->src = (unsigned char *)memalign(16, pixs*4);
195		img->dst = (unsigned char *)memalign(16, pixs*4);
196
197		/* read rgb data with 'r,g,b,0' formatted */
198		dot = img->src;
199		pc++;
200		for (i = 0; i < pixs*4; i++) {
201			if (i % 4 == 3) {
202				*dot++ = 0;
203			} else {
204				*dot++ = *pc++;
205			}
206		}
207
208		return;
209	}
210
211	/* write ppm data to output file */
212	void write_ppm(image_t *img, char *fname)
213	{
214		int i;
215		int w = img->width;
216		int h = img->height;
217		unsigned char *dot = img->dst;
218		FILE *fp;
219
220		if ((fp = fopen(fname, "wb+")) == NULL) {
221			fprintf(stderr, "failed to open file %s\n", fname);
222			exit(1);
223		}
224
225		fprintf(fp, "%s\n", PPM_MAGIC);
226		fprintf(fp, "%d %d\n", w, h);
227		fprintf(fp, "255\n");
228
229		for (i = 0; i < (w * h * 4); i++) {
230			if (i % 4 == 3) {
231				dot++;
232			} else {
233				putc((int) *dot++, fp);
234			}
235		}
236
237		fclose(fp);
238
239		return;
240	}
241
242	void delete_image(image_t *img)
243	{
244		free(img->src);
245		free(img->dst);
246
247		return;
248	}
249
250	int main(int argc, char **argv)
251	{
252		image_t image;
253
254		printf(INFO);
255
256		read_ppm(&image, IN_FILENAME);
257
258		rgb2y(image.src, image.dst, image.width * image.height);
259
260		write_ppm(&image, OUT_FILENAME);
261
262		delete_image(&image);
263
264		return 0;
265	}
\endcode

<table border="1">

<tr><td><b>Line:9</b></td><td>
Filename of input source image.
</td></tr>

<tr><td><b>Line:10</b></td><td>
Filename of output source image.
</td></tr>

<tr><td><b>Line:13</b></td><td>
Define the number of the sub <b>task 2 programs</b> as a constant NUM_TASKS. In
this tutorial, 4 instances of the sub task are created to allocate the grayscale
conversion processing to each.
</td></tr>

<tr><td><b>Line:14</b></td><td>
Define the depth of the task queue as a constant QUEUE_DEPTH. In this tutorial,
the depth of the task queue is set to 4 in accordance with the number of the sub
task instances.
</td></tr>

<tr><td><b>Lines:16-21</b></td><td>
Define the structure to store the image information.
</td></tr>

<tr><td><b>Lines:23-31</b></td><td>
Define the structure of the parameter set for storing the information to be
passed into the main <b>task 1 program</b>.
</td></tr>

<tr><td><b>Lines:33-39</b></td><td>
Define the structure for the task queue data element. Each entry in the task
queue will be an instance of this structure. The size of this structure must be
a multiple of 16 bytes.
</td></tr>

<tr><td><b>Line:51</b></td><td>
Declare an instance of the structure we defined at <b>Lines:23-31</b> for
passing parameters to the main task.
</td></tr>

<tr><td><b>Line:54</b></td><td>
This function handles the gray scale processing of input image data buffer and
outputs the results to the destination buffer.
</td></tr>

<tr><td><b>Lines:58-62</b></td><td>
Create the MARS context.
</td></tr>

<tr><td><b>Lines:64-70</b></td><td>
Create the task event flag instance for MPU to MPU communication. This will
be used by the sub tasks instances to notify the main task that their portion
of grayscale processing is completed.
</td></tr>

<tr><td><b>Lines:72-79</b></td><td>
Create the task queue instance for MPU to MPU communication. This will
be used by the main task to send grayscale processing requests to the sub tasks.
</td></tr>

<tr><td><b>Lines:81-88</b></td><td>
Create the task for the main <b>task 1 program</b>.
Specify a context save area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the
main task to context switch.
</td></tr>

<tr><td><b>Lines:90-99</b></td><td>
Create multiple instances for the sub <b>task 2 program</b>. Specify a context
save area size of \ref MARS_TASK_CONTEXT_SAVE_SIZE_MAX to allow the sub tasks to
context switch.
</td></tr>

<tr><td><b>Lines:101-107</b></td><td>
Initialize the parameters for grayscale conversion processing in the parameter
structure declared at <b>Line:51</b>. The parameters stored in this structure
are the addresses of the task ids of the created sub tasks, task event flag and
task queue, storage areas for input/output image data, and total number of
pixels of image data. The host address of this structure is passed to the main
task using the task argument for the main task.
</td></tr>

<tr><td><b>Lines:112-116</b></td><td>
Schedule the main task for execution.
</td></tr>

<tr><td><b>Lines:118-122</b></td><td>
Wait for the main task to complete execution.
</td></tr>

<tr><td><b>Lines:124-128</b></td><td>
Destroy the main task instance.
</td></tr>

<tr><td><b>Lines:130-136</b></td><td>
Destroy the sub task instances.
</td></tr>

<tr><td><b>Lines:138-143</b></td><td>
Destroy the MARS context.
</td></tr>

<tr><td><b>Lines:146-209</b></td><td>
This function reads the input source image file from <b>Line:9</b> and stores
the image data into the structure defined at <b>Lines:16-21</b>.
</td></tr>

<tr><td><b>Lines:212-240</b></td><td>
This function writes the output grayscaled image data to the output image
file from <b>Line:10</b>.
</td></tr>

<tr><td><b>Lines:242-248</b></td><td>
This function cleans up an instance of the image data structure.
</td></tr>

<tr><td><b>Lines:250-263</b></td><td>
This is the entry function of the <b>host program</b> that does the following:

1. Read the input image data from input image file.\n
2. Process the grayscale conversion of input image data.\n
3. Write the output image data to output image file.\n
4. Cleanup the image data instance.\n
</td></tr>

</table>

\n
<b>(task 1 program)</b>
\code
  1	#include <stdio.h>
  2	#include <stdint.h>
  3	#include <spu_intrinsics.h>
  4	#include <spu_mfcio.h>
  5	#include <mars/task.h>
  6
  7	#define NUM_TASKS	4
  8
  9	#define ALIGN4_UP(x)	(((x) + 0x3) & ~0x3)
 10
 11	typedef struct {
 12		uint64_t ea_task_id;
 13		uint64_t ea_event;
 14		uint64_t ea_queue;
 15		uint64_t ea_src;
 16		uint64_t ea_dst;
 17		uint32_t num;
 18		uint32_t pad;
 19	} grayscale_params_t;
 20
 21	typedef struct {
 22		uint64_t ea_event;
 23		uint64_t ea_src;
 24		uint64_t ea_dst;
 25		uint32_t num;
 26		uint32_t id;
 27	} grayscale_queue_elem_t;
 28
 29	static struct mars_task_id task2_id[NUM_TASKS];
 30	static struct mars_task_args task2_args;
 31
 32	static grayscale_params_t grayscale_params __attribute__((aligned(16)));
 33	static grayscale_queue_elem_t data __attribute__((aligned(16)));
 34
 35	int mars_task_main(const struct mars_task_args *task_args)
 36	{
 37		int ret, i, tag = 0;
 38		int num, remain, chunk;
 39		uint64_t ea_task_id, ea_event, ea_queue;
 40		uint64_t ea_src, ea_dst;
 41		uint16_t mask = 0;
 42
 43		/* Get application parameters */
 44		mfc_get(&grayscale_params, task_args->type.u64[0], sizeof(grayscale_params_t), tag, 0, 0);
 45		mfc_write_tag_mask(1 << tag);
 46		mfc_read_tag_status_all();
 47
 48		ea_task_id = grayscale_params.ea_task_id;
 49		ea_event   = grayscale_params.ea_event;
 50		ea_queue   = grayscale_params.ea_queue;
 51		ea_src     = grayscale_params.ea_src;
 52		ea_dst     = grayscale_params.ea_dst;
 53		num        = grayscale_params.num;
 54
 55		/* Get sub task ids */
 56		mfc_get(&task2_id, ea_task_id, sizeof(struct mars_task_id) * NUM_TASKS, tag, 0, 0);
 57		mfc_write_tag_mask(1 << tag);
 58		mfc_read_tag_status_all();
 59
 60		/* Pass queue ea to sub task args */
 61		task2_args.type.u64[0] = ea_queue;
 62
 63		/* Schedule sub tasks for execution */
 64		for (i = 0; i < NUM_TASKS; i++) {
 65			ret = mars_task_schedule(&task2_id[i], &task2_args, 0);
 66			if (ret) {
 67				printf("Could not schedule MARS sub task! (%d)\n", ret);
 68				return 1;
 69			}
 70		}
 71
 72		remain = num;
 73		chunk = num/NUM_TASKS;
 74		for (i = 0; i < NUM_TASKS; i++) {
 75			data.ea_event = ea_event;
 76			data.ea_src   = ea_src;
 77			data.ea_dst   = ea_dst;
 78			data.id       = i;
 79			if (remain > chunk) {
 80				data.num = ALIGN4_UP(chunk);
 81			} else {
 82				data.num = ALIGN4_UP(remain);
 83			}
 84
 85			/* Push data to queue */
 86			ret = mars_task_queue_push_begin(ea_queue, &data, tag);
 87			if (ret) {
 88				printf("Could not push data to MARS task queue! (%d)\n", ret);
 89				return 1;
 90			}
 91			ret = mars_task_queue_push_end(ea_queue, tag);
 92			if (ret) {
 93				printf("Could not complete data push to MARS task queue! (%d)\n", ret);
 94				return 1;
 95			}
 96
 97			remain -= chunk;
 98			ea_src += (chunk * 4);
 99			ea_dst += (chunk * 4);
100
101			/* Create event mask */
102			mask |= 1 << i;
103		}
104
105		/* Wait until specified bits are set to event flag */
106		ret = mars_task_event_flag_wait(ea_event, mask, MARS_TASK_EVENT_FLAG_MASK_AND, NULL);
107		if (ret) {
108			printf("Could not wait for MARS task event flag! (%d)\n", ret);
109			return 1;
110		}
111
112		/* Wait for all scheduled sub tasks to complete */
113		for (i = 0; i < NUM_TASKS; i++) {
114			ret = mars_task_wait(&task2_id[i], NULL);
115			if (ret) {
116				printf("Could not wait for MARS sub task! (%d)\n", ret);
117				return 1;
118			}
119		}
120
121		return 0;
122	}
\endcode

<table border="1">

<tr><td><b>Line:7</b></td><td>
Define the number of the sub <b>task 2 programs</b> that need to be scheduled
for execution. This number should be the same as the one specified in the
<b>host program</b> at <b>Line:13</b>.
</td></tr>

<tr><td><b>Lines:11-19</b></td><td>
Define the structure for the parameters passed in from the <b>host program</b>.
This is a redefinition of the same structure defined in the <b>host program</b>
at <b>Lines:23-31</b>.
</td></tr>

<tr><td><b>Lines:21-27</b></td><td>
Define the structure for the task queue entry data.
This is a redefinition of the same structure defined in the <b>host program</b>
at <b>Lines:33-38</b>.
</td></tr>

<tr><td><b>Lines:29-30</b></td><td>
Declare an array of task ids and an instance of a task arg structure that will
be passed into the sub task.
</td></tr>

<tr><td><b>Lines:32-33</b></td><td>
Declare an instance of the parameter structure to be passed in from the <b>host
program</b> and an instance of the task queue data entry structure.
</td></tr>

<tr><td><b>Lines:44-46</b></td><td>
Memory transfer the grayscale parameter structure from the host storage address
specified in the task args sent from the <b>host program</b>.
</td></tr>

<tr><td><b>Lines:48-53</b></td><td>
Initialize the local variables with the parameters from the <b>host program</b>.
</td></tr>

<tr><td><b>Lines:56-58</b></td><td>
Memory transfer the array of sub task ids from the host storage address
specified in the task args sent from the <b>host program</b>.
</td></tr>

<tr><td><b>Line:61</b></td><td>
Initialize the task args to pass into the sub task and give it the host address
of the task queue.
</td></tr>

<tr><td><b>Lines:64-70</b></td><td>
Schedule all the instances of the sub tasks for execution.
</td></tr>

<tr><td><b>Lines:72-103</b></td><td>
Partition the source image data evenly to each of the multiple sub task
instances. Push the partitioned data into the task queue so that each sub task
can pop it and begin processing. The parameters for the partitioned data
indicate the host addresses and the number of pixels of the partitioned
input/output data and the host addresses of task event flag and the
identification numbers of each sub task.
</td></tr>

<tr><td><b>Lines:106-110</b></td><td>
Wait for the task event flag event that notifies when all sub tasks have
completed their processing. The main task will enter a wait state until the
event is received.

\note In this tutorial, the task event flag is used only for example purposes.
In this tutorial, the task event flag is only used to notify the main task
of sub task completion. Since the main task waits for completion of all sub
tasks immediately after waiting for the task event flag, the waiting of the task
event flag is not actually necessary.
</td></tr>

<tr><td><b>Lines:113-119</b></td><td>
Wait for completion of all sub tasks.
</td></tr>

</table>

\n
<b>(task 2 program)</b>
\code
  1	#include <stdio.h>
  2	#include <stdint.h>
  3	#include <spu_intrinsics.h>
  4	#include <spu_mfcio.h>
  5	#include <mars/task.h>
  6
  7	#define MAX_BUFSIZE	(16 << 10)
  8
  9	typedef struct {
 10		uint64_t ea_event;
 11		uint64_t ea_src;
 12		uint64_t ea_dst;
 13		uint32_t num;
 14		uint32_t id;
 15	} grayscale_queue_elem_t;
 16
 17	static unsigned char src_spe[MAX_BUFSIZE] __attribute__((aligned(128)));
 18	static unsigned char dst_spe[MAX_BUFSIZE] __attribute__((aligned(128)));
 19
 20	static grayscale_queue_elem_t data __attribute__((aligned(16)));
 21
 22	void rgb2y(unsigned char *src, unsigned char *dst, int num)
 23	{
 24		int i;
 25
 26		__vector unsigned char *vsrc = (__vector unsigned char *) src;
 27		__vector unsigned char *vdst = (__vector unsigned char *) dst;
 28
 29		__vector unsigned int vr, vg, vb, vy, vpat;
 30		__vector float vfr, vfg, vfb, vfy;
 31
 32		__vector float vrconst = spu_splats(0.29891f);
 33		__vector float vgconst = spu_splats(0.58661f);
 34		__vector float vbconst = spu_splats(0.11448f);
 35		__vector float vfzero = spu_splats(0.0f);
 36		__vector unsigned int vmax = spu_splats((unsigned int) 255);
 37
 38		__vector unsigned char vpatr = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x00,
 39									0x10, 0x10, 0x10, 0x04,
 40									0x10, 0x10, 0x10, 0x08,
 41									0x10, 0x10, 0x10, 0x0c };
 42		__vector unsigned char vpatg = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x01,
 43									0x10, 0x10, 0x10, 0x05,
 44									0x10, 0x10, 0x10, 0x09,
 45									0x10, 0x10, 0x10, 0x0d };
 46		__vector unsigned char vpatb = (__vector unsigned char) { 0x10, 0x10, 0x10, 0x02,
 47									0x10, 0x10, 0x10, 0x06,
 48									0x10, 0x10, 0x10, 0x0a,
 49									0x10, 0x10, 0x10, 0x0e };
 50		__vector unsigned char vpaty = (__vector unsigned char) { 0x03, 0x03, 0x03, 0x10,
 51									0x07, 0x07, 0x07, 0x10,
 52									0x0b, 0x0b, 0x0b, 0x10,
 53									0x0f, 0x0f, 0x0f, 0x10 };
 54		__vector unsigned char vzero = spu_splats((unsigned char) 0);
 55
 56		for (i = 0; i < num/4; i++) {
 57			vr = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatr);
 58			vg = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatg);
 59			vb = (__vector unsigned int) spu_shuffle(vsrc[i], vzero, vpatb);
 60
 61			vfr = spu_convtf(vr, 0);
 62			vfg = spu_convtf(vg, 0);
 63			vfb = spu_convtf(vb, 0);
 64
 65			vfy = spu_madd(vfr, vrconst, vfzero);
 66			vfy = spu_madd(vfg, vgconst, vfy);
 67			vfy = spu_madd(vfb, vbconst, vfy);
 68		
 69			vy = spu_convtu(vfy, 0);
 70
 71			vpat = spu_cmpgt(vy, vmax);
 72			vy = spu_sel(vy, vmax, vpat);
 73
 74			vdst[i] = (__vector unsigned char) spu_shuffle(vy, (__vector unsigned int) vzero, vpaty);
 75		}
 76
 77		return;
 78	}
 79
 80	int mars_task_main(const struct mars_task_args *task_args)
 81	{
 82		int ret, tag = 0;
 83		int my_id;
 84		uint64_t ea_event, ea_queue;
 85		uint16_t bits;
 86		uint64_t ea_src, ea_dst;
 87		unsigned int remain, num;
 88
 89		ea_queue = task_args->type.u64[0];
 90
 91		/* Pop data from queue */
 92		ret = mars_task_queue_pop_begin(ea_queue, &data, tag);
 93		if (ret) {
 94			printf("Could not pop data from MARS task queue! (%d)\n", ret);
 95			return 1;
 96		}
 97		ret = mars_task_queue_pop_end(ea_queue, tag);
 98		if (ret) {
 99			printf("Could not complete data pop from MARS task queue! (%d)\n", ret);
100			return 1;
101		}
102
103		my_id    = data.id;
104		ea_event = data.ea_event;
105		ea_src   = data.ea_src;
106		ea_dst   = data.ea_dst;
107		remain   = data.num;
108
109		/* main loop */
110		while (remain > 0) {
111			if (remain > MAX_BUFSIZE/4) {
112				num = MAX_BUFSIZE/4;
113			} else {
114				num = remain;
115			}
116
117			/* DMA Transfer : GET input data */
118			mfc_get(src_spe, ea_src, num * 4, tag, 0, 0);
119			mfc_write_tag_mask(1 << tag);
120			mfc_read_tag_status_all();
121
122			/* convert to grayscale data */
123			rgb2y(src_spe, dst_spe, num);
124
125			/* DMA Transfer : PUT output data */
126			mfc_put(dst_spe, ea_dst, num * 4, tag, 0, 0);
127			mfc_write_tag_mask(1 << tag);
128			mfc_read_tag_status_all();
129
130			remain -= num;
131			ea_src += num * 4;
132			ea_dst += num * 4;
133		}
134
135		/* Set bit to SPURS event flag */
136		bits = 1 << my_id;
137		ret = mars_task_event_flag_set(ea_event, bits);
138		if (ret) {
139			printf("Could not set MARS task event flag! (%d)\n", ret);
140			return 1;
141		}
142
143		return 0;
144	}
\endcode

<table border="1">

<tr><td><b>Lines:9-15</b></td><td>
Define the structure for the task queue entry data. This is a redefinition of
the same structure defined in the <b>host program</b> at <b>Lines:33-38</b> as
well as in the <b>task 1 program</b> at <b>Lines:11-19</b>.
</td></tr>

<tr><td><b>Lines:17-18</b></td><td>
Declare instances of the source and destination buffer to store the processing
input/output data.
</td></tr>

<tr><td><b>Lines:22-78</b></td><td>
This function handles the grayscale processing of the partitioned input
image data in the source buffer and stores the output to the destination buffer.
</td></tr>

<tr><td><b>Line:89</b></td><td>
Get the host address of the task queue passed in from the main task.
</td></tr>

<tr><td><b>Lines:92-101</b></td><td>
Pop the data from the task queue. If the main task has not pushed data into
the task queue by the time of this call, this task will enter a wait state and
its context will be switched out. When the task is able to pop data from the
task queue, this task will resume execution and continue.
</td></tr>

<tr><td><b>Lines:103-107</b></td><td>
Initialize the local variables with the parameters from the data popped from
the task queue.
</td></tr>

<tr><td><b>Line:110</b></td><td>
Loop until all the input data has been processed. Processing of data in each
loop iteration is limited by the size of the local buffer sizes specified at
<b>Line:7</b>.
</td></tr>

<tr><td><b>Lines:118-120</b></td><td>
Memory transfer the input data from host storage to the source buffer declared
at <b>Line:17</b>.
</td></tr>

<tr><td><b>Line:123</b></td><td>
Do the actual grayscale processing of image data from source buffer to
destination buffer.
</td></tr>

<tr><td><b>Lines:126-128</b></td><td>
Memory transfer the output data from the destination buffer declared at
<b>Line:18</b> to host storage.
</td></tr>

<tr><td><b>Lines:136-141</b></td><td>
Set the task event flag bits specified by this sub task's identification number.
</td></tr>

</table>

\n
********************************************************************************
<hr>
********************************************************************************
********************************************************************************
\section sec_10 10 API Reference

This section will describe the MARS API.

- Host Library API
 - Base Management
  - \ref mars_malloc
  - \ref mars_realloc
  - \ref mars_alloca_align
  - \ref mars_free
  - \ref mars_ea_memalign
  - \ref mars_ea_free
  - \ref mars_ea_get
  - \ref mars_ea_get_uint16
  - \ref mars_ea_get_uint32
  - \ref mars_ea_get_uint64
  - \ref mars_ea_put
  - \ref mars_ea_put_uint16
  - \ref mars_ea_put_uint32
  - \ref mars_ea_put_uint64
  - \ref mars_ea_map
  - \ref mars_ea_unmap
  - \ref mars_ea_sync
  - \ref mars_ea_to_ptr
  - \ref mars_ptr_to_ea
  - \ref mars_get_ticks
 - Context Management
  - \ref mars_context_create
  - \ref mars_context_destroy
  - \ref mars_context_get_num_mpus
 - Mutex Management
  - \ref mars_mutex_create
  - \ref mars_mutex_destroy
  - \ref mars_mutex_reset
  - \ref mars_mutex_lock
  - \ref mars_mutex_lock_get
  - \ref mars_mutex_unlock
  - \ref mars_mutex_unlock_put
 - Workload Model Management
  - \ref mars_workload_queue_add_begin
  - \ref mars_workload_queue_add_end
  - \ref mars_workload_queue_remove_begin
  - \ref mars_workload_queue_remove_end
  - \ref mars_workload_queue_schedule_begin
  - \ref mars_workload_queue_schedule_end
  - \ref mars_workload_queue_unschedule_begin
  - \ref mars_workload_queue_unschedule_end
  - \ref mars_workload_queue_wait
  - \ref mars_workload_queue_try_wait
  - \ref mars_workload_queue_signal_send
 - Task Management
  - \ref mars_task_create
  - \ref mars_task_destroy
  - \ref mars_task_schedule
  - \ref mars_task_wait
  - \ref mars_task_try_wait
  - \ref mars_task_get_ticks
 - Task Synchronization
  - Barrier
   - \ref group_mars_task_barrier "mars_task_barrier_create"
   - \ref group_mars_task_barrier "mars_task_barrier_destroy"
  - Event Flag
   - \ref group_mars_task_event_flag "mars_task_event_flag_create"
   - \ref group_mars_task_event_flag "mars_task_event_flag_destroy"
   - \ref group_mars_task_event_flag "mars_task_event_flag_clear"
   - \ref group_mars_task_event_flag "mars_task_event_flag_set"
   - \ref group_mars_task_event_flag "mars_task_event_flag_wait"
   - \ref group_mars_task_event_flag "mars_task_event_flag_try_wait"
  - Queue
   - \ref group_mars_task_queue "mars_task_queue_create"
   - \ref group_mars_task_queue "mars_task_queue_destroy"
   - \ref group_mars_task_queue "mars_task_queue_count"
   - \ref group_mars_task_queue "mars_task_queue_clear"
   - \ref group_mars_task_queue "mars_task_queue_push"
   - \ref group_mars_task_queue "mars_task_queue_try_push"
   - \ref group_mars_task_queue "mars_task_queue_pop"
   - \ref group_mars_task_queue "mars_task_queue_try_pop"
   - \ref group_mars_task_queue "mars_task_queue_peek"
   - \ref group_mars_task_queue "mars_task_queue_try_peek"
  - Semaphore
   - \ref group_mars_task_semaphore "mars_task_semaphore_create"
   - \ref group_mars_task_semaphore "mars_task_semaphore_destroy"
  - Signal
   - \ref group_mars_task_signal "mars_task_signal_send"

- MPU Library API
 - Workload Model Management
  - \ref mars_module_main
  - \ref mars_module_get_ticks
  - \ref mars_module_get_mars_context_ea
  - \ref mars_module_get_kernel_id
  - \ref mars_module_get_workload_id
  - \ref mars_module_get_workload
  - \ref mars_module_get_workload_by_id
  - \ref mars_module_workload_query
  - \ref mars_module_workload_wait_set
  - \ref mars_module_workload_wait_reset
  - \ref mars_module_workload_signal_set
  - \ref mars_module_workload_signal_reset
  - \ref mars_module_workload_schedule_begin
  - \ref mars_module_workload_schedule_end
  - \ref mars_module_workload_unschedule_begin
  - \ref mars_module_workload_unschedule_end
  - \ref mars_module_workload_wait
  - \ref mars_module_workload_yield
  - \ref mars_module_workload_finish
  - \ref mars_module_host_signal_send
  - \ref mars_module_host_callback_set
  - \ref mars_module_host_callback_reset
  - \ref mars_module_mutex_lock_get
  - \ref mars_module_mutex_unlock_put
  - \ref mars_module_dma_get
  - \ref mars_module_dma_put
  - \ref mars_module_dma_wait
 - Task Management
  - \ref mars_task_main
  - \ref mars_task_exit
  - \ref mars_task_yield
  - \ref mars_task_schedule
  - \ref mars_task_unschedule
  - \ref mars_task_wait
  - \ref mars_task_try_wait
  - \ref mars_task_call_host
  - \ref mars_task_get_kernel_id
  - \ref mars_task_get_id
  - \ref mars_task_get_name
  - \ref mars_task_get_ticks
 - Task Synchronization
  - Barrier
   - \ref group_mars_task_barrier "mars_task_barrier_notify"
   - \ref group_mars_task_barrier "mars_task_barrier_wait"
   - \ref group_mars_task_barrier "mars_task_barrier_try_wait"
  - Event Flag
   - \ref group_mars_task_event_flag "mars_task_event_flag_clear"
   - \ref group_mars_task_event_flag "mars_task_event_flag_set"
   - \ref group_mars_task_event_flag "mars_task_event_flag_wait"
   - \ref group_mars_task_event_flag "mars_task_event_flag_try_wait"
  - Queue
   - \ref group_mars_task_queue "mars_task_queue_count"
   - \ref group_mars_task_queue "mars_task_queue_clear"
   - \ref group_mars_task_queue "mars_task_queue_push"
   - \ref group_mars_task_queue "mars_task_queue_push_begin"
   - \ref group_mars_task_queue "mars_task_queue_push_end"
   - \ref group_mars_task_queue "mars_task_queue_try_push"
   - \ref group_mars_task_queue "mars_task_queue_try_push_begin"
   - \ref group_mars_task_queue "mars_task_queue_pop"
   - \ref group_mars_task_queue "mars_task_queue_pop_begin"
   - \ref group_mars_task_queue "mars_task_queue_pop_end"
   - \ref group_mars_task_queue "mars_task_queue_try_pop"
   - \ref group_mars_task_queue "mars_task_queue_try_pop_begin"
   - \ref group_mars_task_queue "mars_task_queue_peek"
   - \ref group_mars_task_queue "mars_task_queue_peek_begin"
   - \ref group_mars_task_queue "mars_task_queue_peek_end"
   - \ref group_mars_task_queue "mars_task_queue_try_peek"
   - \ref group_mars_task_queue "mars_task_queue_try_peek_begin"
  - Semaphore
   - \ref group_mars_task_semaphore "mars_task_semaphore_acquire"
   - \ref group_mars_task_semaphore "mars_task_semaphore_release"
  - Signal
   - \ref group_mars_task_signal "mars_task_signal_send"
   - \ref group_mars_task_signal "mars_task_signal_wait"
   - \ref group_mars_task_signal "mars_task_signal_try_wait"

\n
********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_base Base Management API

********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_context Context Management API

The MARS context is what holds all necessary information and data for each MARS
instance initialized for the system.

Before any of the MARS functionalities can be utilized, an instance of a MARS
context must be initialized. When the system is completely done with MARS
functionality, the context must be finalized.

When a context is initialized within a system by the host processor, each MPU
(depending on how many MPUs are initialized for the context) is loaded with the
MARS kernel that stays resident in MPU storage and continues to run until the
host processor finalizes the context.

The context also creates the workload queue in host storage. Each kernel,
through the use of atomic synchronization primitives, will reserve and schedule
workloads from this queue.

When the context is finalized, all kernels running on the MPUs are terminated
and all resources are freed.

In a system, multiple MARS contexts may be initialized and the kernels and
workloads of each context will be independent of each other. However, one of the
main purposes of MARS is to avoid the high cost of process context switches
within MPUs initiated by the host processor. If multiple MARS contexts are
initialized, there will be an enormous decrease in performance as each MARS
context is context switched in and out. In the ideal scenario, there should be a
single MARS context initialized for the whole system.
********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_mutex Mutex Management API

A MARS mutex instance can be used to protect blocks of code from executing
simultaneously whether it be in a host program or MPU program. This can be
useful if some code in the host program or MPU program accesses some common
resource, such as a global variable. If the block of code is protected by the
MARS mutex, it is guaranteed that the protected block of code in the host
program will not be executed simultaneously as any other host program thread or
any other MPU program.

The MARS mutex is independent of the MARS context or MARS workload model.
A MARS mutex can be used in a host program without even creating a MARS context.
A MARS mutex can also be used in an MPU program independent of any MARS workload
model or API. However, an MPU program independent of any MARS workload model
means the user will be responsible for the loading and execution of such a
program and has close to no meaning with regards to the usage of MARS.

The MARS mutex does not call into the MARS kernel's scheduler. This means that
when some entity attempts to lock a mutex that is already locked, the mutex will
block execution of the entity until the lock can be obtained. For the MPU-side,
this means that the MARS kernel can not schedule any other workloads while a
MARS mutex is waiting to lock.

\note The use of mutexes should be mainly reserved for implementing the workload
model layer. Access to the mutex API is limited to the workload module layer
from the MPU-side. It is left up to the workload module whether or not to
provide access to mutex routines through the workload model API.

\note If you want to make use of synchronization methods that call into the
MARS kernel's scheduler and allow for other workloads to be scheduled during the
time a synchronization object waits, refer to the synchronization methods
provided by the various workload models.
********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_workload_queue Workload Queue Management API

The workload queue API is the interface between the MARS context and the MARS
workload model host library. The host library of the workload model
implementation will need to use the workload queue API in order to provide
proper workload management functionalities.

The workload queue API provides the basic funtions to create, schedule, remove a
workload context within the workload queue. It also provides APIs to do signal
handling of workloads and to wait for specific workloads to complete.
********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_workload_module Workload Module Management API

The workload module API is the interface between the MARS kernel and the
MARS workload module for the specified workload model. The workload specific
module will need to use the workload module API in order to provide proper
workload management functionalities.

The workload module API provides the basic functions to get various workload
information, schedule other workloads, handle workload signals, and also
functions to transition the workload state and return execution back to the MARS
kernel.

********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_task Task Management API

The MARS task is one type of MARS workload model. The MARS task is a single
execution of an MPU program that is scheduled to be run by the MARS kernel.

Tasks can be used to run a small MPU program many times. However the primary
usage of the task model is for large grained programs that take long amounts of
time to process. Since tasks may occupy the MPU for a long time and prevent
other workloads to be executed on that MPU, it has the ability to yield the MPU
to other workloads.

The MARS task synchronization API also provides various methods that when used
to wait for certain events, allows it to enter a wait state. When tasks have
yielded or are waiting, the task state is saved into host storage and the MPU is
freed up to process other available workloads.
********************************************************************************
**/

/**
********************************************************************************
\defgroup group_mars_task_sync Task Synchronization API

The MARS Task Synchronization API provides various methods of synchronization
between the host program running on the host processor and MARS tasks running on
the MPUs, as well as between MARS tasks and other MARS tasks running across
various MPUs.

As described previously, enabling MARS tasks to send/receive data directly
between each other independently of the host is the important factor in
improving the usability and efficiency of MPUs. MARS provides various
synchronization and communication functions which can make efficient interaction
between MARS tasks or between MARS tasks and host programs.

The MARS Task Synchronization API provides the following types of
synchronization objects:

<b>(1) MARS Task Barrier</b>

This is used to make multiple MARS tasks wait at a certain point in a program
and to resume the task execution when all tasks are ready.

<b>(2) MARS Task Event Flag</b>

This is used to send event notifications between MARS tasks or between MARS
tasks and host programs.

<b>(3) MARS Task Queue</b>

This is used to provide a FIFO queue mechanism for data transfer between MARS
tasks or between MARS tasks and host programs.

<b>(4) MARS Task Semaphore</b>

This is used to limit the number of concurrent accesses to shared resources
among MARS tasks.

<b>(5) MARS Task Signal</b>

This is used to signal a MARS task in the waiting state to change state so that
it can be scheduled to continue execution.
********************************************************************************
**/

/**
********************************************************************************
\ingroup group_mars_task_sync
\defgroup group_mars_task_barrier Task Barrier API

The MARS task barrier allows the synchronization of multiple tasks. At barrier
initialization, the total number of tasks that need to be synchronized is
specified. When each task arrives at the barrier, it will notify the barrier and
enter a waiting state until the barrier is released. When the total number of
tasks specified at initialization have arrived at the barrier and notified it,
the barrier is released and all tasks are returned to the ready state to be
scheduled to run once again.
********************************************************************************
**/

/**
********************************************************************************
\ingroup group_mars_task_sync
\defgroup group_mars_task_event_flag Task Event Flag API

The MARS task event flag allows the synchronization between multiple tasks and
the host program by sending and receiving 32-bit event flags between one
another.

The event flags can be sent from host program to MARS task or vice versa, as
well as between multiple MARS tasks. While waiting on certain event flags to
be received, the task transitions to the waiting state until the event flag is
received.
********************************************************************************
**/

/**
********************************************************************************
\ingroup group_mars_task_sync
\defgroup group_mars_task_queue Task Queue API

The MARS task queue allows for sending and receiving of data between multiple
MARS tasks and the host program.

From either a host program or MARS task you can push data into the queue and
also from either a host program or MARS task you can pop data out from the
queue as soon as it becomes available.

The advantage of the MARS task queue is that when a MARS task requests to do a
pop and no data is available yet to be received from the queue, the MARS task
will enter a waiting state. As soon as data is available to be popped from the
queue, the MARS task can be scheduled for resumed execution with the received
data.
********************************************************************************
**/

/**
********************************************************************************
\ingroup group_mars_task_sync
\defgroup group_mars_task_semaphore Task Semaphore API

The MARS task semaphore allows for synchronization between multiple tasks by
preventing simultaneous access of some shared resource. The semaphore can be
specified with how many simultaneous tasks can access the semaphore at any given
time.

Whenever a task wants to access some semaphore protected shared resource, it
must first request to acquire the semaphore access (P operation) of the
semaphore. When done accessing the shared resource it must then release access
(V operation) of the semaphore. If attempting to request a a semaphore and other
tasks have already requested the total number of allowed accesses, the task will
transition to the waiting state until some other tasks release the semaphore and
access is obtained.
********************************************************************************
**/

/**
********************************************************************************
\ingroup group_mars_task_sync
\defgroup group_mars_task_signal Task Signal API

The MARS task signal is the simplest form of synchronization between a host
program and multiple MARS tasks.

From either a host program or MARS task you can specify a certain task to
signal. When the task waits for a signal to be received it will be transitioned
to the waiting state until the signal is received.
********************************************************************************
**/
