开发指南

Getting Started Guide

3. Getting Started Guide

3.1 NeuroPilot Workflow

This section helps developers understand the NeuroPilot development process.

3.1.1 NeuroPilot Overview

NeuroPilot is a collection of software tools and APIs that allow users to create AI applications, based on neural network models, for MediaTek hardware platforms. With NeuroPilot, users can develop and deploy AI applications on edge devices with extremely high efficiency, while also keeping data private.

3.1.1.1 NeuroPilot Key Features

NeuroPilot’s run-time software allows any TensorFlow Lite (.tflite) model to run on a NeuroPilot-compatible MediaTek platform. This capability is guaranteed by TensorFlow Lite and the Mediatek run-time layers that support it.
NeuroPilot allows users to tune and optimize neural networks for latency and size, beyond what is offered by open framework tools. This can result in large gains in on-device performance, and reductions in power and memory consumption.
NeuroPilot allows users to profile a neural network running on a target device. Users can use this capability to fine-tune their networks based on hardware device behavior.
NeuroPilot offers support for bare-metal application programming on Linux devices. This allows for minimal overheads during run-time, which is important for embedded devices.

3.1.2 General Workflow

Model Preparation: The user starts with a trained neural network model. This model includes the model definition (structure), and a set of model parameters as 32-bit floating-point (FP32) data. This is typically the final step of the training process.
Model optimization: The user optimizes the model using one or more NeuroPilot tools. These tools take the trained model from the previous step and produce a more lightweight and optimized representation for MediaTek platforms. This process may involve several steps, and is dependent on the development worflow and target device. The resulting model is stored as a TensorFlow Lite (.tflite) or Deep Learning Archive (.dla) file.
Model deployment: The optimized model is loaded at run-time on the target device. At run-time, the model may use one or more device SDK libraries and APIs. Certain neural-network operations (ops) have platform-specific limitations. For a list of supported operations for each hardware platform, see the platform’s documentation at section: NeuroPilot Introduction and Platform Specification -> 2. Hardware Support Specification.

NeuroPilot supports the following platform-specific development workflows.

Android Development: The user wants to develop an AI app, AI platform features, or an AI framework, on a MediaTek device running Android.
embedded_linux_development: The user wants to deploy a neural network model on an embedded MediaTek device running Linux.

3.1.3 Android Development Workflow

This section describes the process for developing AI applications on MediaTek platforms running Android.

MediaTek supports Google Tensorflow Lite’s standard AI model development workflow on mobile platforms, including model conversion, optimization, interpretation, and inference on Android mobile platforms. For details, see https://www.tensorflow.org/lite.
MediaTek also provides the NeuroPilot SDK, which includes tools, APIs, and an on-device software stack (MTK Neuron SW). NeuroPilot SDK includes all of the features of Google TensorFlow Lite, while also offering more flexibility and better performance.

3.1.3.1 NeuroPilot Architecture

Figure: NeuroPilot Architecture

3.1.3.1.1 Development Phases

There are general two development phases, as shown in Figure: NeuroPilot Architecture.

Phase 1: Model preparation and optimization (PC)

Prepare a trained model.
Optimize the trained model.
Convert the optimized model to TensorFlow Lite (TFLite) format.

Phase 2: Model deployment (Device)

Select the appropriate model deployment method. For details, see section: Developer Roles and Model Deployment Methods.
Select the appropriate software integration entry point. For details, see section: NeuroPilot Entry Points.

3.1.3.1.2 NeuroPilot Android Software Stack

The NeuroPilot Android Software Stack is composed of the following layers, as shown in Figure: NeuroPilot Architecture：

CV/NN APP: This layer contains the code that the user writes to run the AI application. This layer also includes the MTK Interpreter, which is a MediaTek-APU-optimized version of the Android TensorFlow Lite interpreter, and TFLite Shim API, which is a wrapper layer on top of MTK Interpreter designed to make API calls simpler.
NN Runtime: This layer contains run-time libraries that provide neural network acceleration, including Google Neural Network API (NNAPI), and MediaTek Neuron compiler/runtime.
Middleware: This layer allows dynamic control over MediaTek AI compute cores. Quality-of-Service controls for running neural network workloads go here.
Driver: This layer provides Android drivers for the special-purpose MediaTek AI compute cores.

3.1.3.1.3 NeuroPilot Entry Points

NeuroPilot Entry Points, shown in Figure 1 - NeuroPilot Architecture in red, represent the different AI model software integration entry points that developers can choose from. See section: NeuroPilot Entry Points.

3.1.3.2 Developer Roles and Model Deployment Methods

Before starting to integrate the converted TFlite AI model into an AI application, users need to select a model deployment method. The development method is based on the user’s developer role and development goal.

Figure: Developer Roles and Model Deployment Methods

3.1.3.2.1 Developer Roles

3rd-Party App Vendor: An Android application vendor, that is not Google or an OEM. Includes academic apps.
3rd-Party Algorithm Vendor: A company that partners with an OEM or 3rd-Party App Vendor to provide an AI algorithm solution.
OEM: A manufacturer that develops mobile phones based on a MediaTek hardware platform (SOC, system-on-a-chip).

3.1.3.2.2 Model Deployment Method

1.Online Compile

Description: The model is in Tensor Flow Lite format, or in a code-based format defined using the NNAPI or Neuron Adapter API. The application loads and then compiles the model to DLA(Deep Learning Archive) format on the device. This process should be platform independent; an AI application can load and then compile the same model on multiple MediaTek platforms. Users can also utilize the same model in multiple AI projects.
Advantages
MediaTek platform independent
Neuron software version independent
Customer project independent
Disadvantages
Initial latency (can be improved by caching)
HIDL overhead
Summary: Best for portability

2. Offline Compile

Description: The TFLite model is compiled to DLA format on the user’s computer. The AI application loads the DLA file directly on the device, and can immediately start inference. The DLA file is compiled for use on only one specific MediaTek platform.
Advantages
Low initial latency
No HIDL overhead
Disadvantages
MDLA hardware version dependent
Neuron software version dependent
Summary: Best for performance

3.1.3.2.3 Developer and Model Deployment Methods Mapping

The following table maps developer types and purpose(or goals) to deployment methods and NeuroPilot entry points.

Role	Development Goal	NeuroPilot Entry Point	AI Application Location	Deployment Method	Notes
OEM	APK with AI features	TfLite Shim API with Neuron Delegate or NNAPI Delegate	System partition	Online	Use the TFlite model for development
	AI framework	Neuron Adapter API NNAPI	System partition	Online	Use platform APU hardware features to develop a high-level AI framework
	Platform AI features (e.g. in camera HAL)	Neuron SDK	Vendor partition	Offline	The user deploys a compiled AI model (.dla) using the Neuron Runtime API. Neuron SDK does not support deploying AI models on the system partition.
3rd-Party App Vendor	APK with AI features	TfLite Shim API with NNAPI Delegate or Neuron Delegate	System partition	Online	Use the TFlite model for development 3^rd-party app developers cannot use Neuron Runtime API, because Neuron Runtime API only supports applications on the vendor partition.
	AI framework	Neuron Adapter API NNAPI	System partition	Online	Use platform APU hardware features to develop a high-level AI framework
3rd-Party Algorithm Vendor	Choice of AI model deployment path depends on the requirements of the algorithm vendor’s partner (OEM or 3rd-party app vendor)

Note:

AI Application Location refers to the location of the final executable application. The concept of system and vendor partitions comes from Android VNDK. For more details, see https://source.android.com/docs/core/architecture/vndk.

The partitions are controlled by Android Linker. The system and vendor partitions contain the following paths:

dir.system
dir.system = /system/bin/
dir.system = /system/xbin/
dir.system = /system/system_ext/bin/
dir.system = /data/local/tests/system
dir.system = /data
dir.system = /product/app/
dir.vendor
dir.vendor = /odm/bin/
dir.vendor = /vendor/bin/
dir.vendor = /data/nativetest/odm
dir.vendor = /data/nativetest64/odm
dir.vendor = /data/benchmarktest/odm
dir.vendor = /data/benchmarktest64/odm
dir.vendor = /data/nativetest/vendor
dir.vendor = /data/nativetest64/vendor
dir.vendor = /data/benchmarktest/vendor
dir.vendor = /data/benchmarktest64/vendor
dir.vendor = /data/local/tests/vendor

For more details about the Android Iinker, see https://source.android.com/docs/core/architecture/vndk/linker-namespace

3.1.3.3 Android Development Workflow

Figure: Android Development Workflow

Note:

For information about a platform’s support for Neural Network Operations (OPs) and usage restrictions, see section: NeuroPilot Introduction and Platform Specification -> Hardware Support Specification -> Supported Operations.
To optimize AI models for improved performance on MediaTek Deep Learning Accelerator (MDLA) 1.5, 2.0, and 3.0 devices, see section: Developer Tools -> Advanced Development -> Platform-Aware Model Design Guide.
If the customer wants to develop their own AI framework based on NeuroPilot, these model optimization guidelines are not appropriate. See section: Custom AI Framework Based on NeuroPilot.

3.1.3.3.1 Step 1: Model Preparation

Prepare a trained PyTorch, TensorFlow, or Caffe AI model.
Install NeuroPilot tools. See section: NeuroPilot Installation and Setup.

3.1.3.3.2 Step 2: Model Optimization

Optimize your AI model by following the steps below.

(Optional) Neural Architecture Search (MLKits, Premium only): NAS automatically searches for the optimal network architecture and can achieve higher model quality. See MLKitNAS. MLKits can perform NAS with optional pruning and quantization.
(Optional) Model quantization: Quantization is a model optimization technique that converts floating-point model data into lower-bit unsigned or signed integers. Quantization offers both model compression and performance improvements with a negligible loss in output quality, by taking advantage of the integer-only operator implementation on MediaTek platforms. NeuroPilot offers two methods of quantization:

Quantization-aware training (Quantization Tool): Higher quantization accuracy for precision-sensitive scenarios. This method is more complex, as users are required to construct a pipeline for model training. See section: Developer Tools ->Model Development ->Quantization.
Post-training quantization (Converter Tool): Lower quantization accuracy, for scenarios that are not sensitive to accuracy. Simpler to use than quantization-aware training. See section: Developer Tools ->Model Development -> Converter: Post-Training Quantization.

Model Conversion (Converter Tool): Convert the target model to TFlite format for deployment. See section: Developer Tools ->Model Development -> Converter.
(Optional) Device simulation (AI Simulator): Users without a physical device can use AI Simulator to evaluate the performance of the AI model on a specific MediaTek platform. See AISimulator.

5a. (Required for Online Compile) On-device model compatibility check and performance evaluation (Neuron Compiler, NNBenchmark):

Check that the model is compatible with the target platform’s APU using Neuron Compiler(ncc-tflite) with command ncc-tflite {filename} --arch {target architecture} --check-target-only. See section: Developer Tools ->Model Development ->Neuron SDK: Neuron Compiler and Runtime.
Evaluate the performance of the model on a real device using the NNBenchmark Demo provided by MediaTek. See section: Getting Started Guide-> Hello World Tutorial -> Java TFlite Tutorial and section: Getting Started Guide-> Hello World Tutorial -> C++ TFlite Tutorial.

5b. (Required for Offline Compile) Offline model compilation, performance evaluation, and optimization (Neuron SDK): Compile the model to DLA format using Neuron Compiler (ncc-tflite), and then evaluate performance on a real device using using Neuron Runtime Profiler. See NeuronProfiler. Users can also perform additional optimization workflows, such as TCM, GNO, and Compiler Custom API, using Neuron SDK. See section: Developer Tools ->Model Development ->Neuron SDK.

3.1.3.3.3 Step 3. Model Deployment

1. Model Deployment with Online Compile

Users can deploy AI models (Tflite) in their code using one of the following entry points.

TFLite Shim API + Neuron Delegate
TFLite Shim API + NNAPI Delegate

For details, see section: Developer Tools ->Application Development -> TFLite Shim API.

2. TFLite Neuron Delegate vs. TFLite NNAPI Delegate

MediaTek recommends using Neuron Delegate, for the following reasons.

Better performance because of fewer software layers.
Better flexibility:
Neuron Delegate allows the user to create and use custom APIs.
Neuron Delegate allows the user to create custom operations (ops) that are not in the TensorFlow Lite built-in ops.

Figure: TFLite Neuron Delegate vs. TFLite NNAPI Delegate

3. Model Deployment with Offline Compile

The user uses Neuron Runtime API to deploy a DLA AI model in the vendor partition, in order to provide platform AI features such as in the Camera HAL.

For details, see section: Developer Tools ->Model Development ->Neuron SDK: Neuron Runtime API .

3.1.3.4 Custom AI Framework Based on NeuroPilot

This section is for users who want to use NeuroPilot to develop a custom AI Framework, similar to Paddle Lite or MACE, for a MediaTek platform.

3.1.3.4.1 Development Overview

Figure: AI Custom Framework

The user is responsible for the development and maintenance of AI engine tools and AI model adaptation layers, as shown in Figure 3 - AI Custom Framework.
Developers should use Neuron Adapter API to develop the custom AI framework. For details, see section: Developer Tools ->Application Development -> Neuron Adapter API.
To maximize the energy efficiency of the custom AI framework, hardware platform features must be adapted such as:
APU operator restrictions
Operator fusion rules
Compiler optimization parameters

3.1.3.4.2 Development Workflow

The typical Neuron Adapter API development workflow is shown below:

Figure: Neuron Adapter API Workflow

3.1.3.5 NeuroPilot Entry Points

3.1.3.5.1 API Entry Overview

Entry Point	Description
NeuroPilot TFLite Shim API + Neuron Delegate NeuroPilot TFLite Shim API + NNAPI delegate	TFLite Shim API wraps the TFLite API to simplify TensorFlow Lite usage Compatible with NeuroPilot Converter Tool TFLite Shim API is built on top of the MediaTek TFLite interpreter, which is optimized for MediaTek APU features
Google NNAPI (Android 8.1 or higher)	A Google-designed base layer for high-level machine learning frameworks MediaTek only provides NNAPI hardware acceleration
Neuron Adapter API	A MediaTek-designed base layer for connecting high-level machine learning frameworks to Neuron software Similar to NNAPI, in that it connects to various AI frameworks (such as the Paddle framework). Neuron Adapter API connects to Neuron Software Layer (Neuron compiler + Neuron runtime). Neuron Software Layer converts different models into NIR (Neuron Graph Intermediate Representation), and compiles models on the device using the Neuron internal compiler.
Neuron Runtime API	API entry point in the Neuron SDK. Neuron Runtime API currently only supports AI applications in the vendor partition.

3.1.3.5.2 NeuroPilot TFLite Shim API + Neuron Delegate

Figure: NeuroPilot TFLite Shim API + Neuron Delegate Entry Point

API entry (*)
NeuroPilot TFLite Shim API
Languages
Java, C
Deployment mode
Online compile mode
Target Developers
3rd-party app / OEM with the goal of developing APK with AI features
Implementation location
System partition

3.1.3.5.3 NeuroPilot TFLite Shim API + NNAPI Delegate

Figure: NeuroPilot TFLite Shim API + NNAPI Delegate Entry Point

API entry (*)
NeuroPilot TFLite Shim API
Languages
Java, C
Deployment mode
Online compile mode
Target Developers
3rd-party app / OEM with the goal of developing APK with AI features
Implementation location
System partition

3.1.3.5.4 NeuroPilot Neuron Adapter API

Figure: NeuroPilot Neuron Adapter API Entry Point

API entry (*)
Neuron Adapter API
Languages
C
Deployment mode
Online compile mode
Target Developers
3rd-party app / OEM developing an AI framework
Implementation location
System partition

3.1.3.5.5 Google NNAPI

Figure: Google NNAPI Entry Point

API entry (*)
NNAPI
Languages
C
Deployment mode
Online compile mode
Target Developers
3rd-party app / OEMs developing an AI framework
Implementation location
System partition
Limitations
Belongs to Google AOSP
If there are unsupported OPs, the OPs cannot be customized.
Examples
https://github.com/android/ndk-samples/tree/main/nn-samples

3.1.3.5.6 NeuroPilot Neuron SDK

Figure: NeuroPilot Neuron SDK Entry Point

API entry (*)
Neuron Runtime API
Languages
C
Deployment mode
Offline compile on developer’s computer
Platform dependent
Target Developers
OEM algorithm teams, 3rd-party algorithm vendors with the goal of developing platform AI features (e.g., in Camera HAL)
Implementation location
Vendor partition

3.2 NeuroPilot Installation and Setup

This chapter is intended to guide users through the installation process for NeuroPilot and its component software tools.

3.2.1 Setting Up Android Studio

Android Studio is required to run several of the examples and tutorials in this document. Follow these steps to install and set up Android Studio.

Download and then install the latest version of Android Studio.
Run Android Studio, and then open SDK Manager.
Under SDK Platforms, install all required Android versions.

Under SDK Tools, install NDK .

3.2.2 Quantization Tool Installation Guide

3.2.2.1 System requirements

Python 3.5 or 3.6

3.2.2.2 Python dependencies

Required:

argparse >= 1.2
jsonschema
numpy >= 1.13.3
packaging

Optional:

tensorflow version: >= 1.13 and < 2.0 (required by TensorFlow V1 quantization tool)
tensorflow version: >= 2.0 and < 2.5 (required by TensorFlow V2 quantization tool)
torch version: >= 1.3 and < 1.9 (required by PyTorch quantization tool)

Note:

Quantization Tool is well tested with the following versions of TensorFlow: 1.13.1, 1.14.0, 1.15.0, 2.2.0, 2.3.0, 2.4.0.
Quantization Tool is well tested with the following versions of PyTorch: 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0.

3.2.2.3 Installation

3.2.2.3.1 Installing Quantization Tool within a Virtual Environment

Download the Quantization Tool installation package from neuropilot_downloads.

Quantization Tool is provided as a wheel package containing all the required installation scripts and files.

Install the package using pip by running the following command.

$ pip3 install mtk_quantization-<version>-py3-none-any.whl

Verify the installation. The below script prints the installed package version.

$ python3 -c 'import mtk_quantization; print(mtk_quantization.__version__)'

3.2.2.3.2 Installing Quantization Tool without a Virtual Environment

Download the Quantization Tool installation package from neuropilot_downloads.

Quantization Tool is provided as a wheel package containing all the required installation scripts and files.

Install the package using pip by running the following command.

$ pip3 install --user mtk_quantization-<version>-py3-none-any.whl

Verify the installation. The below script prints the installed package version.

$ python3 -c 'import mtk_quantization; print(mtk_quantization.__version__)'

Because the package is installed under user mode, you might need to configure the PATH environment variable in order to use the executables. The executables are installed in the Python user install directory for your platform, which is typically at ~/.local/bin by default.

$ export PATH=~/.local/bin:$PATH

3.2.3 Converter Tool Installation Guide

3.2.3.1 System requirements

64-bit Linux.
Python 3.5 or 3.6
pip >= 8.1.0 (required by manylinux1)

Warning:

For Python 3.5, we don’t support version 3.5.0 and 3.5.1 because of an ABI incompatible issue in pybind11 package. See the GitHub issue for more details.

3.2.3.2 Python dependencies

Required:

argparse >= 1.2
flatbuffers >= 1.12
matplotlib >= 3.0
numpy >= 1.13.3
packaging
protobuf >= 3.5.1
pybind11 >= 2.2
tqdm >= 4.0

Optional:

tensorflow version: >= 1.13 and < 2.5 (required by TensorFlow V1 Converter)
tensorflow version: >= 2.0 and < 2.5 (required by TensorFlow Converter)
torch version: >= 1.3 and < 1.9 (required by PyTorch Converter)

Note:

Caffe support is based on the SSD extension of the original Caffe repository. For details, see https://github.com/weiliu89/caffe/releases/tag/ssdv1.0.
Converter Tool is well tested with the following versions of TensorFlow: 1.13.1, 1.14.0, 1.15.0, 2.2.0, 2.3.0, 2.4.0.
Converter Tool is well tested with the following versions of PyTorch: 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0.

3.2.3.3 Installation

3.2.3.3.1 Installing Converter Tool within a Virtual Environment

Download the Converter Tool installation package from neuropilot_downloads.

Converter Tool is provided as a wheel package containing all the required installation scripts and files.

Install the package using pip by running the following command.

# For Python 3.5
$ pip3 install mtk_converter-<version>-cp35-cp35m-manylinux1_x86_64.whl

# For Python 3.6
$ pip3 install mtk_converter-<version>-cp36-cp36m-manylinux1_x86_64.whl

Verify the installation. The below script prints the installed package version.

$ python3 -c 'import mtk_converter; print(mtk_converter.__version__)'

3.2.3.3.2 Installing Converter Tool without a Virtual Environment

Download the Converter Tool installation package from neuropilot_downloads.

Converter Tool is provided as a wheel package containing all the required installation scripts and files.

Install the package using pip by running the following command.

# For Python 3.5
$ pip3 install --user mtk_converter-<version>-cp35-cp35m-manylinux1_x86_64.whl

# For Python 3.6
$ pip3 install --user mtk_converter-<version>-cp36-cp36m-manylinux1_x86_64.whl

Verify the installation. The below script prints the installed package version.

$ python3 -c 'import mtk_converter; print(mtk_converter.__version__)'

Because the package is installed under user mode, you might need to configure the PATH environment variable in order to use the executables. The executables are installed in the Python user install directory for your platform, which is typically at ~/.local/bin by default.

$ export PATH=~/.local/bin:$PATH

3.2.4 Neuron SDK

The Neuron SDK allows users to convert their custom models to MediaTek-proprietary binaries for deployment on MediaTek platforms. The resulting models are highly efficient, with reduced latency and a smaller memory footprint. Users can also create a runtime environment, parse compiled model files, and perform inference on the edge. Neuron SDK is aimed at users who are performing bare metal C/C++ programming for AI applications, and offers an alternative to the Android Neural Networks API (NNAPI) for deploying Neural Network models on MediaTek-enabled Android devices.

3.2.4.1 Requirements

We recommend using the Neuron SDK tool in the following environment:

CPU: x86-64
Operating system:
Ubuntu 14.04
Ubuntu 16.04
Ubuntu 18.04

3.2.4.2 Installing Neuron SDK

Download the Neuron SDK package from neuropilot_downloads.

Neuron SDK is provided as a compressed tar package titled <date>_MDLA_vx.x_SW_<version>_release.tar.gz. This package contains all the required installation scripts and files.

Uncompress the Neuron SDK package in your working directory by running the following commands.

# Uncompress the tar package
$ tar -xf <date>_MDLA_vx.x_SW_<version>_release.tar.gz

# Change to tool's root location
$ cd <date>_MDLA_vx.x_SW_<version>_release

All executables under host directory can be executed directly. For more information, see readme.txt.

3.3 Hello World Tutorial

3.3.1 Neural Network Model Creation

3.3.1.1 TensorFlow Model to TFLite

MediaTek provides a converter tool which takes a protobuf (.pb) file and produces a TFLite file (.tflite) that is ready to run.

This example uses the MobileNetV1 neural network . This neural network model is an image classifier network that is widely used on mobile devices. This network takes images at 224x224 resolution, and classifies each of them into one of 1000 different classes.

The objective of this example is to take a trained MobileNetV1 model and produce a .tflite model that is ready to use on a MediaTek Android device. The below steps show how to convert the MobileNetV1 model from TensorFlow to TFLite.

3.3.1.1.1 Floating-point Model Conversion Example

The most common case is to start with a trained TensorFlow model in FP32 format.

# Download MobileNet_V1_224 TensorFlow float frozen model
$ wget http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz
$ tar xvf mobilenet_v1_1.0_224.tgz

$ mtk_tensorflow_v1_converter \
--input_frozen_graph_def_file mobilenet_v1_1.0_224_frozen.pb \
--output_file mobilenet_v1_mtk.tflite \
--input_names=input \
--input_shapes=1,224,224,3 \
--output_names=MobilenetV1/Logits/SpatialSqueeze

In order to perform this conversion, the user must know which tensors in the network are the inputs and outputs. Because the MobileNetV1 network is a public reference model, we can specify the tensor names directly in this example.

Note:

The values of arguments input_names, output_names, and input_shapes used in the example can be checked using TensorBoard or another model visualization tool, such as Netron .

When this command finishes, you should see output similar to the following.

Importing the TensorFlow model ...
done
Converting the model ...
[General Transformations]
Before Transform
Subgraph # 0: 223 operators 361 tensors.
After Pass # 1, Elapsed time: 0:00:00.317777
Subgraph # 0: 251 operators 417 tensors.
After Pass # 2, Elapsed time: 0:00:00.499618
Subgraph # 0: 140 operators 253 tensors.
After Pass # 3, Elapsed time: 0:00:00.659985
Subgraph # 0: 30 operators 88 tensors.
[Post-General Transformations]
Before Transform
Subgraph # 0: 30 operators 88 tensors.
done
Exporting the TFLite model ...
done

3.3.1.1.2 Quantized Model Conversion Example

This example shows how to take a quantized network model and produce a model optimized for MediaTek devices. This is often convenient when a quantized model is already available. Some quantized models are tuned to a very high accuracy using many re-training iterations, which may require lots of time and compute resources. Starting with this kind of model may yield better final accuracy results. In this mode, the input file needs to be in the protobuf format (.pb).

# Download MobileNet_V1_224 TensorFlow fake-quantized frozen model
$ wget https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz
$ tar xvf mobilenet_v1_1.0_224_quant.tgz

# Convert to quantized TFLite with input min-max range as (-1, 1)
$ mtk_tensorflow_v1_converter \
--input_frozen_graph_def_file mobilenet_v1_1.0_224_quant_frozen.pb \
--output_file mobilenet_v1_quant_mtk.tflite \
--input_names=input \
--input_shapes=1,224,224,3 \
--output_names=MobilenetV1/Predictions/Reshape_1 \
--quantize=True \
--input_value_ranges=-1,1

Note:

The value of arguments input_names, output_names, and input_shapes used in the example can be checked using TensorBoard or another model visualization tool, such as Netron .

3.3.1.2 PyTorch Model to TFLite

This example shows how to convert a PyTorch model to TFLite format (.tflite).

import torch
import torchvision

# Download and prepare the PyTorch model
model = torchvision.models.mobilenet_v2(pretrained=True)
trace_data = torch.randn(1, 3, 224, 224)
trace_model = torch.jit.trace(model.cpu().eval(), (trace_data))
torch.jit.save(trace_model, 'mobilenet_v2_float.pt')

# Convert to TFLite
import mtk_converter
converter = mtk_converter.PyTorchConverter.from_script_module_file(
'mobilenet_v2_float.pt', [[1, 3, 224, 224]]
)
_ = converter.convert_to_tflite(output_file='mobilenet_v2_float.tflite')

3.3.2 Java TFlite Tutorial

3.3.2.1 Model Preparation

In order to run neural network models on an Android device, the network model must be prepared according to the tutorial shown in the 2.3.1. Neural Network Model Creation section. Please read and understand that tutorial before proceeding.

Note:

This example should be run on a Android device running NeuroPilot 5. For a list of devices, see NeuroPilot Introduction and Platform Specification -> 2. Hardware Support Specification.
Typically, most .tflite models will run on Android devices. However, some operations in the neural network model may cause large differences in run time speed. This is due to special cases in how different Android versions and hardware platforms support the operation. For details on device capabilities and operation support, see the documentation for your platform at NeuroPilot Introduction and Platform Specification -> 2. Hardware Support Specification.
The standard TFLite interpreter on an Android device does not support sparse matrices, which means it does not support sparse pruned models produced by pruning. To deploy pruned models, use 4.1.2. Neuron SDK.

3.3.2.2 Java Sample Code

The following sample Java application is a simple timed benchmark for running a MobileNet image classifier model. The sample application follows the Java Native Application development flow described in 2.2.3.1. Android Development. The Android project includes a sample image, which is used as an input to the network. The application invokes the neural network Interpreter, receives the output classification, and reports the latency of the image inference.

Sample Java benchmark app

Sample source code is provided in the file Android_S_NNBenchmark on the Downloads page.

Note:

neuropilot.aar provides the NeuroPilot Java API. This library is included in the Java sample code package.

3.3.2.3 Application Code

The following section describes the major steps of the sample app, to help explain the process of invoking neural network models in Android. The code below can be found in the file app/src/main/java/com/mediatek/nn/benchmark/NNTestBase.java .

Import the NeuroPilot optimized interpreter, interpreter options, delegate, and delegate options.

import com.mediatek.neuropilot.Delegate;
import com.mediatek.neuropilot.nnapi.NnApiDelegate;
import com.mediatek.neuropilot.neuron.NeuronDelegate;
import com.mediatek.neuropilot.Interpreter;
import com.mediatek.neuropilot.Interpreter.Options;

Memory-map the model file in the APK Assets.

private MappedByteBuffer loadModelFile() throws IOException {
    AssetFileDescriptor fileDescriptor = mActivity.getAssets().openFd(mModelName + ".tflite");
    FileInputStream inputStream = new FileInputStream(fileDescriptor.getFileDescriptor());
    FileChannel fileChannel = inputStream.getChannel();
    long startOffset = fileDescriptor.getStartOffset();
    long declaredLength = fileDescriptor.getDeclaredLength();
    return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
}

Generate the input data and output buffer from the image file.

private ByteBuffer generateTestInout(String modelName) {
    int batchsize = 1;
    int imageSizeX = 224;
    int imageSizeY = 224;
    int dimPixelSize = 3;
    int numBytesPerChannel = 1;

    if (modelName.contentEquals(new String("mobilenet_float"))) {
        numBytesPerChannel = 4;
    }

    mImgData = ByteBuffer.allocateDirect(batchsize
    * imageSizeX
    * imageSizeY
    * dimPixelSize
    * numBytesPerChannel);
    mImgData.order(ByteOrder.nativeOrder());
    mImgData.rewind();

    Bitmap bmp = Util.getBitmapFromAsset(mActivity.getApplicationContext(),     "grace_hopper.bmp");
    Bitmap scaledBmp = Bitmap.createScaledBitmap(bmp, imageSizeX, imageSizeY, true);
    if (modelName.contentEquals(new String("mobilenet_float"))) {
        convertBitmapToFloatBuffer(scaledBmp, imageSizeX, imageSizeY, mImgData);
    mLabelProbArrayFloat = new float[1][mLabelList.size()];
    } else if (modelName.contentEquals(new String("mobilenet_quantized"))) {
        convertBitmapToByteBuffer(scaledBmp, imageSizeX, imageSizeY, mImgData);
        mLabelProbArray = new byte[1][mLabelList.size()];
    }

    return mImgData;
}

Note:

Floating-point models require 4 bytes per (color) channel, so the size of the input buffer is larger if the model uses floating point values. Integer models require only 1 byte per channel, and consequently require less memory to store input images.

Initialize the interpreter with Neuron delegate or NNAPI.

private Interpreter mInterpreter;
private Options mOptions;

public final void createBaseTest(NNBenchmark ipact) {
    mActivity = ipact;
    mLabelList = loadLabelList(mModelName);
    mImgData = generateTestInout(mModelName);
    try {

    // new interpreter options
    mOptions = new Options();

    // use neuron delegate
    //NeuronDelegate.Options neuron_options = new NeuronDelegate.Options();
    //neuron_options.setCacheDir(mActivity.getFilesDir().getAbsolutePath());
    //neuron_options.setAllowFp16(true);
        //neuron_options.setExecutionPreference(NeuronDelegate.Options.EXECUTION_PREFERENCE_SUSTAINED_SPEED);
    //Delegate delegate = new NeuronDelegate(neuron_options);

    // use nnapi
    NnApiDelegate.Options nnapi_options = new NnApiDelegate.Options();
    nnapi_options.setCacheDir(mActivity.getFilesDir().getAbsolutePath());
    nnapi_options.setAllowFp16(true);
    nnapi_options.setExecutionPreference(NnApiDelegate.Options.EXECUTION_PREFERENCE_SUSTAINED_SPEED);
    Delegate delegate = new NnApiDelegate(nnapi_options);

    // add delegate
        mOptions.addDelegate(delegate);
        mInterpreter = new Interpreter(loadModelFile(), mOptions);
        mInterpreter.resizeInput(0, mInputShape);

    } catch (IOException e) {
        e.printStackTrace();
    }
}

Options in NNAPI:

/**
* Use the default setting. The default setting of NNAPI is
* EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER
*/
public static final int EXECUTION_PREFERENCE_UNDEFINED = -1;

/**
* Prefer executing in a way that minimizes battery drain. This is desirable for compilations
* that will be executed often.
*/
public static final int EXECUTION_PREFERENCE_LOW_POWER = 0;

/**
* Prefer returning a single answer as fast as possible, even if this causes more power
* consumption.
*/
public static final int EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER = 1;

/**
* Prefer maximizing the throughput of successive frames, for example when processing successive
* frames coming from the camera.
*/
public static final int EXECUTION_PREFERENCE_SUSTAINED_SPEED = 2;

public static final int EXECUTION_PRIORITY_LOW = 90;
public static final int EXECUTION_PRIORITY_MEDIUM = 100;
public static final int EXECUTION_PRIORITY_HIGH = 110;

/**
* Sets the inference preference for precision/compilation/runtime tradeoffs.
*
* @param preference One of EXECUTION_PREFERENCE_LOW_POWER,
* EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER, and EXECUTION_PREFERENCE_SUSTAINED_SPEED.
*/
public Options setExecutionPreference(int preference) {
    this.executionPreference = preference;
    return this;
}

public Options setExecutionPrioriy(int executionPriority) {
    this.executionPriority = executionPriority;
    return this;
}
public Options setMaxCompilationTimeoutDurationNs(long maxCompilationTimeoutDurationNs) {
    this.maxCompilationTimeoutDurationNs = maxCompilationTimeoutDurationNs;
    return this;
}
public Options setMaxExecutionTimeoutDurationNs(long maxExecutionTimeoutDurationNs) {
    this.maxExecutionTimeoutDurationNs = maxExecutionTimeoutDurationNs;
    return this;
}
public Options setMaxExecutionLoopTimeoutDurationNs(long maxExecutionLoopTimeoutDurationNs) {
    this.maxExecutionLoopTimeoutDurationNs = maxExecutionLoopTimeoutDurationNs;
    return this;
}

/**
* Specifies the name of the target accelerator to be used by NNAPI. If this parameter is
* specified, the {@link #setUseNnapiCpu(boolean)} method won't have any effect.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setAcceleratorName(String name) {
    this.acceleratorName = name;
    return this;
}

/**
* Configure the location to be used to store model compilation cache entries. If either {@code
* cacheDir} or {@code modelToken} parameters are unset, NNAPI caching will be disabled.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setCacheDir(String cacheDir) {
    this.cacheDir = cacheDir;
    return this;
}

/**
* Sets the token to be used to identify this model in the model compilation cache. If either
* {@code cacheDir} or {@code modelToken} parameters are unset, NNAPI caching will be disabled.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setModelToken(String modelToken) {
    this.modelToken = modelToken;
    return this;
}

/**
* Sets the maximum number of graph partitions that the delegate will try to delegate. If more
* partitions could be delegated than the limit, the partitions with the larger number of nodes are
* chosen. If unset, it will use the NNAPI default limit.
*/
public Options setMaxNumberOfDelegatedPartitions(int limit) {
    this.maxDelegatedPartitions = limit;
    return this;
}

/**
* Enable or disable the NNAPI CPU Device "nnapi-reference". If unset, it will use the NNAPI
* default settings.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setUseNnapiCpu(boolean enable) {
    this.useNnapiCpu = !enable;
    return this;
}

/**
* Enable or disable to allow fp32 computation to be run in fp16 in NNAPI. See
* https://source.android.com/devices/neural-networks#android-9
*
* <p>Only effective on Android 9 (API level 28) and above.
*/
public Options setAllowFp16(boolean enable) {
    this.allowFp16 = enable;
    return this;
}

Options in Neuron Delegate:

/**
* Use the default setting. The default setting of NEURON is
* EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER
*/
public static final int EXECUTION_PREFERENCE_UNDEFINED = -1;

/**
* Prefer executing in a way that minimizes battery drain. This is desirable for compilations
* that will be executed often.
*/
public static final int EXECUTION_PREFERENCE_LOW_POWER = 0;

/**
* Prefer returning a single answer as fast as possible, even if this causes more power
* consumption.
*/
public static final int EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER = 1;

/**
* Prefer maximizing the throughput of successive frames, for example when processing successive
* frames coming from the camera.
*/
public static final int EXECUTION_PREFERENCE_SUSTAINED_SPEED = 2;

public static final int EXECUTION_PRIORITY_LOW = 90;
public static final int EXECUTION_PRIORITY_MEDIUM = 100;
public static final int EXECUTION_PRIORITY_HIGH = 110;

/**
* Sets the inference preference for precision/compilation/runtime tradeoffs.
*
* @param preference One of EXECUTION_PREFERENCE_LOW_POWER,
* EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER, and EXECUTION_PREFERENCE_SUSTAINED_SPEED.
*/
public Options setExecutionPreference(int preference) {
    this.executionPreference = preference;
    return this;
}

public Options setExecutionPrioriy(int executionPriority) {
    this.executionPriority = executionPriority;
    return this;
}

public Options setEnableLowLatency(boolean enableLowLatency) {
    this.enableLowLatency = enableLowLatency;
    return this;
}
public Options setEnableDeepFusion(boolean enableDeepFusion) {
    this.enableDeepFusion = enableDeepFusion;
    return this;
}
public Options setEnableBatchProcessing(boolean enableBatchProcessing) {
    this.enableBatchProcessing = enableBatchProcessing;
    return this;
}
public Options setBoostValue(int boostValue) {
    this.boostValue = boostValue;
    return this;
}
public Options setBoostDuration(int boostDuration) {
    this.boostDuration = boostDuration;
    return this;
}

/**
* Specifies the name of the target accelerator to be used by NNAPI. If this parameter is
* specified, the {@link #setUseNnapiCpu(boolean)} method won't have any effect.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setAcceleratorName(String name) {
    this.acceleratorName = name;
    return this;
}

/**
* Configure the location to be used to store model compilation cache entries. If either {@code
* cacheDir} or {@code modelToken} parameters are unset, NNAPI caching will be disabled.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setCacheDir(String cacheDir) {
    this.cacheDir = cacheDir;
    return this;
}

/**
* Sets the token to be used to identify this model in the model compilation cache. If either
* {@code cacheDir} or {@code modelToken} parameters are unset, NNAPI caching will be disabled.
*
* <p>Only effective on Android 10 (API level 29) and above.
*/
public Options setModelToken(String modelToken) {
    this.modelToken = modelToken;
    return this;
}

/**
* Sets the maximum number of graph partitions that the delegate will try to delegate. If more
* partitions could be delegated than the limit, the partitions with the larger number of nodes will
* be chosen. If unset, it will use the NNAPI default limit.
*/
public Options setMaxNumberOfDelegatedPartitions(int limit) {
    this.maxDelegatedPartitions = limit;
    return this;
}

/**
* Enable or disable to allow fp32 computation to be run in fp16 in NNAPI. See
* https://source.android.com/devices/neural-networks#android-9
*
* <p>Only effective on Android 9 (API level 28) and above.
*/
public Options setAllowFp16(boolean enable) {
    this.allowFp16 = enable;
    return this;
}

Run the Interpreter.

public void runTest() {
    if (mInterpreter != null) {
        if (mLabelProbArray != null) {
            mInterpreter.run(mImgData, mLabelProbArray);
        } else if (mLabelProbArrayFloat != null) {
            mInterpreter.run(mImgData, mLabelProbArrayFloat);
        }
    }
}

When the interpreter runs, it produces an Array as output. This output array is a set of class probabilities that indicated how likely each possible classification is, based on the network evaluation. The class with the highest probability is the class that is reported for the image.

3.3.3 C++ TFlite Tutorial

3.3.3.1 Model Preparation

Note:

In general, nearly all .tflite models will run on Android devices. However, some types of operations in the neural network model may cause large differences in run time speed, due to special-cases of operation support, both from the Android version itself, and operation support for a given device. Consult the NeuroPilot Introduction and Platform Specification -> 2. Hardware Support Specification section for more details on device capabilities and operation support.

3.3.3.2 Native App Development

The most effective way of getting top performance on Android devices is to develop applications using the Android NDK . In this native method of development, users write the app in C++, and call APIs provided by the NDK. NNAPI is one of these APIs, and there is also a TFLite C++ API as well. This method still provides all the run-time control of the TFLite interpreter, but yields smaller and more compact applications that can be highly tuned for performance.

3.3.3.3 Native Sample App

We provide a sample native application for reference. This application is based on the MobileNetSSD neural network . This network is an Object Detection network, which takes images as inputs and computes the presence of known objects in the image. These object detection networks can identify an arbitrary number of objects in any single image, including objects that may overlap one another visually. the output of this network is a series of bounding-boxes which identify which region of the input image an object lies, and what is the classification of each object found.

This app is provided as a ready-to-compile example object_detection_mobilessd.tar.gz. Inside are provided several scripts to build and deploy the application.

3.3.3.3.1 Application Build and Load

To build and run this application, we need to perform the following steps.

Build the application code.

cd build
rm -rf *
cmake -DTARGET=aarch64 -DNDK_STANDALONE_TOOLCHAIN=../android-ndk-r17b-toolchain-arm64 ../
make
cp MobilenetSSDDemo ../
cd ..

Prepare the file inputs for the program.

cd input
python Convert2ModelInput.py voc_motor.jpg 300 300
python Convert2ModelInput.py voc_motor.jpg 224 224
cd ..

Note:

Above commands is wrote in run 0_python_convert_input_2_bin.bat.

Push the application data onto the Android device.

adb push model/mobilenet_coco.tflite /data/local/tmp/mobilenet_coco.tflite
adb push model/mobilenet_coco_quant.tflite /data/local/tmp/mobilenet_coco_quant.tflite

adb push model/mobilenet_ssd_pascal.tflite /data/local/tmp/mobilenet_ssd_pascal.tflite
adb push model/mobilenet_ssd_pascal_quant.tflite /data/local/tmp/mobilenet_ssd_pascal_quant.tflite

Note:

Above commands is wrote in 2_push_model.bat.

Push the input data to the device.

adb push input/input_224224_q.bin /data/local/tmp/mobilenet_input_q.bin
adb push input/input_224224_f.bin /data/local/tmp/mobilenet_input_f.bin
adb push input/input_300300_q.bin /data/local/tmp/mobilenet_ssd_input_q.bin
adb push input/input_300300_f.bin /data/local/tmp/mobilenet_ssd_input_f.bin

Note:

Above commands is wrote in 3_push_input.bat.

Run the application on the device.

adb wait-for-device
adb root
adb remount

adb push MobilenetSSDDemo /data/local/tmp/MobilenetSSDDemo

# Run Test
adb shell "cd /data/local/tmp;chmod +x MobilenetSSDDemo;./MobilenetSSDDemo"
# Confirm APU service crash not happened

Note:

Above commands is wrote in 4_run.bat.

3.3.3.4 Application Code

3.3.3.4.1 Mediatek Shim Layer

To aid the development of native app code, Mediatek provides a shim API which makes code development easier and faster. This shim layer will invoke any required NeuroPilot libraries as well as the TFLite interpreter. The example shown here uses this shim layer.

3.3.3.4.2 Code Walk-Through

This application can be understood in several parts.

Initialize the neural network Model.

ANeuralNetworksTFLite* tf;
auto time1 = std::chrono::high_resolution_clock::now();
if (ANeuralNetworksTFLite_create(&tf, model_path) != ANEURALNETWORKS_NO_ERROR){
    return 0;
}
auto time2 = std::chrono::high_resolution_clock::now();
printf("Create and init time: %f ms\n",float((time2-time1).count())/1000000);

Get a handle to the input Tensor, and fill it with data. In this case, this data will be a binary file loaded from disk.
TFLiteTensor inputTensor;
if (ANeuralNetworksTFLite_getTensor(tf,
TFLITE_BUFFER_TYPE_INPUT,
&inputTensor) != ANEURALNETWORKS_NO_ERROR){
    ANeuralNetworksTFLite_free(tf);
    return 0;
}
// Fill input data from file
std::ifstream input(input_path);
if (!input.good()) {
    printf("Fail to read %s\n", input_path);
    return 0;
}

if (inputTensor.type == 2) { // Is uint8
    input.read((char*)inputTensor.buffer, sizeof(uint8_t) * inputTensor.bufferSize);
}else{ // Is float
    input.read((char*)inputTensor.buffer, sizeof(float) * inputTensor.bufferSize);
}
input.close();

Note:

In this sample application, input images are pre-processed into binary files via the script 0_python_convert_input_2_bin.bat. This script generates binary files which can be directly copied into the neural network input tensor.

Run the neural network model and produce one output result from one input sample. Because this is an object segmentation network, there are 2 output tensors which produce results.

model_output(tf1,0,false);
model_output(tf1,1,false);

int model_output(ANeuralNetworksTFLite* tf,int index,bool show_tensor){
    auto time6 = std::chrono::high_resolution_clock::now();
    TFLiteTensor outputTensor;
    if (ANeuralNetworksTFLite_getTensorByIndex(tf,
        TFLITE_BUFFER_TYPE_OUTPUT,
        &outputTensor,
        index) != ANEURALNETWORKS_NO_ERROR){
            ANeuralNetworksTFLite_free(tf);
            return 0;
    }
}

Process the network results. For this object detection network, there is quite a bit of processing that needs to be done on the network outputs. Specifically, the non-maximal suppression (NMS) calculation must be done on the outputs. This calculation takes two inputs representing region proposals and confidence scores for each region, and compute the highest likelihood results via applying NMS.

ssd_post_process(tf1,true,ResultMSG);

Note:

The entire output processing code in the app is too large to reprint here. Please refer to the ssd_post_process() function inside ssd.cpp for details. This example app follows the common NMS implementation, which as many references online.

3.3.3.5 Expected Output

This application is non-GUI, so it will emit only text outputs. When it’s run, it will display a result similar to below.

-------- Load model --------

Create and init time: 363.635 ms
Get input tensor , time: 0.542462 ms , type: 1 , dimsSize: 4 , bufferSize: 270000 , dims=[ 1 300 300 3 ]

-------- Inference --------

Inference time: 10.294231 ms
Get output tensor , time: 0.003000 ms , type: 1 , dimsSize: 3 , bufferSize: 40257 , dims=[ 1 1917 21 ]
Get output tensor , time: 0.001307 ms , type: 1 , dimsSize: 3 , bufferSize: 7668 , dims=[ 1 1917 4 ]

Total 1 detections found:

Top 1 Results:
[Top 1 0.6517] person X:47.19 Y:1.79 W:123.33 H:224.55 BoxID:1794

Post-process time: 6.761308 ms
Total time: 17.086 ms

----- UnitTest verify -----

Performance correct
Accuracy correc

人工智能 Artificial Intelligence

9 篇内容

了解更多