5. NeuroPilot API Reference.

5.1. Converter Tool API Documentation

5.1.1. Backward-Incompatible Changes

5.1.1.1. Version 1.9.0

Converter Tool version 1.9.0 introduces the following changes.

  • Rules for the attribute names passed to the custom_attributes argument in mtk_converter.tfv1.CustomOpAnnotator have changed. Users no longer need to add a leading _ character in the attribute name, or specify the attribute type in the attribute name.

  • The custom_lib_map argument of the mtk_converter.dissect_tflite_model function, the mtk_converter.TFLiteExecutor class, the mtk_converter.MlirExecutor class, and the mtk_converter.TensorFlowV1Converter is renamed to custom_op_lib_map.

5.1.1.2. Version 1.5.0

Converter Tool version 1.5.0 introduces the following changes.

  • The behavior of converter options use_symmetric_quantization and use_weights_symmetric_quantization have changed.

    Previously, these two options forced all the weight and activation tensors to be quantized with symmetric quantization value ranges. Now these two options affect only the post-training quantization process. This means only quantization ranges that are deduced from the provided calibration dataset are affected. Quantization ranges obtained from the FakeQuantize operator (typically produced by the quantization-aware training tool) are kept unchanged.

5.1.1.3. Version 1.3.0

Converter Tool version 1.3.0 introduces the following changes, in order to better support mixed-precision models.

  • The converter option default_quantization_bitwidth was removed.

    Previously, Converter Tool used the converter option default_quantization_bitwidth to set the default bitwidth for all tensors. Now Converter Tool provides the converter option input_quantization_bitwidths for users to set the bitwidths for each model input tensor. For details of how the tensor bitwidths are deduced when converting a quantized model, see 4.1.1.2.2.3. Quantization Bitwidth.

  • The default behavior of converter option default_weights_quantization_bitwidth has changed.

    The default bitwidth of the weight tensors is now the bitwidth of the input tensor of the corresponding convolution.

    • For a convolution with a 16-bit input, the default bitwidth of the weight tensor is 16.

    • For a convolution with an 8-bit input, the default bitwidth of the weight tensor is 8.

  • The MediaTek TFLite custom operator extension (custom operators with MTKEXT_ prefix) was added.

    The new MediaTek TFLite custom operator extension handles cases where the model cannot be expressed by the TFLite built-in operators. This new TFLite custom operator extension is only supported in NeuroPilot 5.0 SDK .

    Note

    To switch back to the legacy behavior from NeuroPilot 4.0 when converting to a TFLite model, pass one of the following values to the tflite_op_export_spec argument.

    • 'legacy': Export the TFLite model with legacy MediaTek TFLite custom operator definitions, and follow the versioning policy of the TFLite built-in operators.

    • 'legacy_ignore_version': Export the TFLite model with legacy MediaTek TFLite custom operator definitions, but do not follow the versioning policy of the TFLite built-in operators. This is typically used when the model is deployed with Neuron SDK.

  • The default value of converter option allow_8w16a_affine_operators has changed from False to True.

5.1.2. Python API

mtk_converter.CaffeConverter(caffe_net, ...)

Caffe Converter class.

mtk_converter.TensorFlowV1Converter(...)

TensorFlow V1 Converter class.

mtk_converter.TensorFlowConverter(concrete_funcs)

TensorFlow Converter class.

mtk_converter.PyTorchConverter(...[, ...])

PyTorch Converter class.

mtk_converter.MlirExecutor(filename_or_binary)

MLIR model executor class.

mtk_converter.TFLiteExecutor(filename_or_binary)

TFLite model executor class.

mtk_converter.TFLiteParser(tflite_filename)

TFLite model parser class.

mtk_converter.dissect_tflite_model(...[, ...])

Dissect the given TFLite model layer-by-layer.

mtk_converter.export_tflite_model_from_dict(...)

Export the JSON-like dict object as a TFLite model.

mtk_converter.load_tflite_model_as_dict(filename)

Load the TFLite model as a JSON-like dict object via flatc flatbuffers compiler.

mtk_converter.plot_model_quant_error(...[, ...])

Plot the quantization error between float/fake-quantized model and quantized model.

mtk_converter.tfv1.CustomOpAnnotator(...[, ...])

A class that helps to build custom operators in TensorFlow V1 graph.

mtk_converter.tfv1.resolve_custom_op_annotations(...)

Resolve the annotations in the TensorFlow model.

mtk_converter.sysconfig.get_include_dir()

Get the directory that containing the C++ header files.

mtk_converter.sysconfig.get_lib_dir()

Get the directory that containing the C++ header files.

mtk_converter.sysconfig.get_compile_flags()

Get the compilation flags for building the implementations of custom operators.

mtk_converter.sysconfig.get_link_flags()

Get the link flags for building the implementations of custom operators.

class mtk_converter.CaffeConverter(caffe_netinput_namesinput_shapesoutput_names)

Caffe Converter class.

Parameters:
  • caffe_net – A caffe_pb2.NetParameter object. The Caffe model to be converted.

  • input_names – A list of str values. The input tensor names.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names.

convert_to_mlir(output_file=None)

Convert the given model to MLIR model based on the provided options.

Parameters:

output_file – A str value. Path to export the output MLIR model file. Defaults to None.

Returns:

A MLIR model buffer as bytes object.

convert_to_tflite(output_file=Nonetflite_op_export_spec=None)

Convert the given model to TFLite model based on the provided options.

Parameters:
  • output_file – A str value. Path to export the output TFLite model file. Defaults to None.

  • tflite_op_export_spec – A str value. The specification of how the operators are exported to TFLite. Defaults to ‘builtin_first’.

Returns:

A TFLite model buffer as bytes object.

class methodfrom_model_files(prototxt_filecaffemodel_fileinput_names=Noneinput_shapes=Noneoutput_names=None)

Create the CaffeConverter object from Caffe model files.

Parameters:
  • prototxt_file – A str value. Path to the Caffe prototxt file (.prototxt).

  • caffemodel_file – A str value. Path to the Caffe model file (.caffemodel).

  • input_names – A list of str values. The input tensor names. Defaults to the input tensor names deduced from the Caffe model.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to the input tensor shapes deduced from the Caffe model.

  • output_names – A list of str values. The output tensor names. Defaults to the output tensor names deduced from the Caffe model.

Returns:

A CaffeConverter object.

get_available_options()

Get the available option names.

Returns:

A list of available option names.

propertyallow_4w8a_affine_operators

bool value. Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Defaults to True.

 
propertyallow_8w16a_affine_operators

bool value. Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Defaults to True.

 
propertyallow_different_affine_output_quantization_type

bool value. Whether to allow different quantization type for the output tensor of affine operators. Defaults to True.

 
propertyallow_dynamic_quantization

bool value. Whether to allow dynamic quantization for affine operators. Defaults to False.

 
propertyallow_incompatible_paddings_for_tflite_pooling

bool value. Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Defaults to False.

 
propertyallow_missing_quantization_ranges

bool value. Whether to allow missing min/max values of the tensors when quantizing the model. Defaults to False.

 
propertyappend_output_dequantize_ops

bool value. Whether to keep the model output as floating-point type. If TrueDequantize operators will be inserted at the end of the model to dequantize the output tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertycalibration_data_count

An int value. The number of calibration data (number of batches) to use when doing post-training quantization. If not set, use the number of elements in the iterator which calibration_data_gen returns. Defaults to None.

 
propertycalibration_data_gen

A function that returns an iterator for the stream of input data used to do post-training quantization. The input data can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy array. Defaults to None.

 
propertyconvert_unsupported_data_type

bool value. Whether to convert the unsupported data types to the supported ones. Note that this conversion is not equivalent and may result in quality drop in some cases. Defaults to True. This option cannot be changed for now.

 
propertydefault_weights_quantization_bitwidth

An int value. The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

 
propertyenable_12bit_data_types

bool value. Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Defaults to False.

 
propertyensure_safe_affine_output_quantization_scale

bool value. Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Defaults to False.

 
propertyensure_same_concat_quantization_params

bool value. Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Defaults to False.

 
propertyinput_quantization_bitwidths

An int value or a list of int values. The quantization bitwidths for the model input tensors. To skip the specific input tensors, users can set them as None. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

 
propertyinput_value_ranges

A list of two float values (e.g., [(min_0, max_0), …, (min_N, max_N)]). The value ranges (used for quantization) for the model input tensors. To skip the specific input tensors, users can replace the min/max tuple as None directly. Defaults to None, no value ranges are set to the model input tensors.

 
propertyprepend_input_quantize_ops

bool value. Whether to keep the model input as floating-point type. If TrueQuantize operators will be inserted at the beginning of the model to quantize the input tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertyquantize

bool value. Whether to quantize the model. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Defaults to True.

 
propertyuse_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Defaults to False.

 
propertyuse_unsigned_quantization_type

bool value. Whether to use unsigned data type for quantization if possible. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Defaults to True.

class mtk_converter.TensorFlowV1Converter(graph_definput_namesinput_shapesoutput_names)

TensorFlow V1 Converter class.

Parameters:
  • graph_def – The tf.GraphDef object. The TensorFlow graph to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

convert_to_mlir(output_file=None)

Convert the given model to MLIR model based on the provided options.

Parameters:

output_file – A str value. Path to export the output MLIR model file. Defaults to None.

Returns:

A MLIR model buffer as bytes object.

convert_to_tflite(output_file=Nonetflite_op_export_spec=None)

Convert the given model to TFLite model based on the provided options.

Parameters:
  • output_file – A str value. Path to export the output TFLite model file. Defaults to None.

  • tflite_op_export_spec – A str value. The specification of how the operators are exported to TFLite. Defaults to ‘builtin_first’.

Returns:

A TFLite model buffer as bytes object.

class methodfrom_frozen_graph_def(graph_definput_namesinput_shapesoutput_names)

Create the TensorFlowV1Converter object from frozen GraphDef object.

Parameters:
  • graph_def – The tf.GraphDef object. The TensorFlow graph to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

Returns:

TensorFlowV1Converter object.

class methodfrom_frozen_graph_def_file(graph_def_fileinput_namesinput_shapesoutput_names)

Create the TensorFlowV1Converter object from frozen GraphDef model file.

Parameters:
  • graph_def_file – A str value. The path to the GraphDef model file to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

Returns:

TensorFlowV1Converter object.

class methodfrom_keras_model(keras_modelinput_names=Noneinput_shapes=Noneoutput_names=Nonedefault_batch_size=None)

Create the TensorFlowV1Converter object from TensorFlow keras model object.

Parameters:
  • keras_model – A tf.keras.Model object. The tf.keras model to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the keras model.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the keras model.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the keras model.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the keras model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowV1Converter object.

class methodfrom_keras_model_file(keras_model_fileinput_names=Noneinput_shapes=Noneoutput_names=Nonecustom_objects=Nonedefault_batch_size=None)

Create the TensorFlowV1Converter object from TensorFlow keras HDF5 model file.

Parameters:
  • keras_model_file – A str value. Path to the tf.keras HDF5 model file.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the keras model.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the keras model.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the keras model.

  • custom_objects – Dict mapping names (strings) to custom classes or functions to be considered during deserialization. Will be passed to the tf.keras.models.load_model function. Defaults to None.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the keras model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowV1Converter object.

class methodfrom_saved_model_dir(saved_model_dirinput_names=Noneinput_shapes=Noneoutput_names=Nonetag_set=Nonesignature_key=Nonedefault_batch_size=None)

Create the TensorFlowV1Converter object from TensorFlow SavedModel.

Parameters:
  • saved_model_dir – A str value. Path to the SavedModel directory.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the SignatureDef.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the SignatureDef.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the SignatureDef.

  • tag_set – Set of tags identifying the MetaGraphDef within the SavedModel to convert. Defaults to None, i.e., set(tf.saved_model.tag_constants.SERVING).

  • signature_key – Key identifying SignatureDef containing inputs and outputs. Defaults to None, i.e., tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the SavedModel are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowV1Converter object.

class methodfrom_session(sessioninput_namesinput_shapesoutput_names)

Create the TensorFlow Converter object from TensorFlow session object.

Parameters:
  • session – The tf.Session object. The TensorFlow session containing the graph to be converted. The graph should have been initialized already.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

Returns:

TensorFlowV1Converter object.

get_available_options()

Get the available option names.

Returns:

A list of available option names.

propertyallow_4w8a_affine_operators

bool value. Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Defaults to True.

 
propertyallow_8w16a_affine_operators

bool value. Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Defaults to True.

 
propertyallow_different_affine_output_quantization_type

bool value. Whether to allow different quantization type for the output tensor of affine operators. Defaults to True.

 
propertyallow_dynamic_quantization

bool value. Whether to allow dynamic quantization for affine operators. Defaults to False.

 
propertyallow_incompatible_paddings_for_tflite_pooling

bool value. Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Defaults to False.

 
propertyallow_missing_quantization_ranges

bool value. Whether to allow missing min/max values of the tensors when quantizing the model. Defaults to False.

 
propertyappend_output_dequantize_ops

bool value. Whether to keep the model output as floating-point type. If TrueDequantize operators will be inserted at the end of the model to dequantize the output tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertycalibration_data_count

An int value. The number of calibration data (number of batches) to use when doing post-training quantization. If not set, use the number of elements in the iterator which calibration_data_gen returns. Defaults to None.

 
propertycalibration_data_gen

A function that returns an iterator for the stream of input data used to do post-training quantization. The input data can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy array. Defaults to None.

 
propertyconvert_unsupported_data_type

bool value. Whether to convert the unsupported data types to the supported ones. Note that this conversion is not equivalent and may result in quality drop in some cases. Defaults to True. This option cannot be changed for now.

 
propertycustom_op_lib_map

A dictionary with string keys and string values. The mapping of custom operator op_type (key) and shared library name (value) of implementation.

 
propertydefault_weights_quantization_bitwidth

An int value. The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

 
propertyenable_12bit_data_types

bool value. Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Defaults to False.

 
propertyensure_safe_affine_output_quantization_scale

bool value. Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Defaults to False.

 
propertyensure_same_concat_quantization_params

bool value. Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Defaults to False.

 
propertyinput_quantization_bitwidths

An int value or a list of int values. The quantization bitwidths for the model input tensors. To skip the specific input tensors, users can set them as None. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

 
propertyinput_value_ranges

A list of two float values (e.g., [(min_0, max_0), …, (min_N, max_N)]). The value ranges (used for quantization) for the model input tensors. To skip the specific input tensors, users can replace the min/max tuple as None directly. Defaults to None, no value ranges are set to the model input tensors.

 
propertyprepend_input_quantize_ops

bool value. Whether to keep the model input as floating-point type. If TrueQuantize operators will be inserted at the beginning of the model to quantize the input tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertyquantize

bool value. Whether to quantize the model. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Defaults to True.

 
propertyuse_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Defaults to False.

 
propertyuse_unsigned_quantization_type

bool value. Whether to use unsigned data type for quantization if possible. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Defaults to True.

class mtk_converter.TensorFlowConverter(concrete_funcsinput_names=Noneinput_shapes=Noneoutput_names=Nonedefault_batch_size=None)

TensorFlow Converter class.

Parameters:
  • concrete_funcs – A list of TensorFlow ConcreteFunction objects. The concrete function objects to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., deducing from the input tensor names defined in the ConcreteFunction.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the ConcreteFunction.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the ConcreteFunction.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the ConcerteFunction are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

convert_to_mlir(output_file=None)

Convert the given model to MLIR model based on the provided options.

Parameters:

output_file – A str value. Path to export the output MLIR model file. Defaults to None.

Returns:

A MLIR model buffer as bytes object.

convert_to_tflite(output_file=Nonetflite_op_export_spec=None)

Convert the given model to TFLite model based on the provided options.

Parameters:
  • output_file – A str value. Path to export the output TFLite model file. Defaults to None.

  • tflite_op_export_spec – A str value. The specification of how the operators are exported to TFLite. Defaults to ‘builtin_first’.

Returns:

A TFLite model buffer as bytes object.

class methodfrom_concrete_functions(concrete_functionsinput_names=Noneinput_shapes=Noneoutput_names=Nonedefault_batch_size=None)

Create the TensorFlowConverter object from ConcreteFunctions.

Parameters:
  • concrete_funcs – A list of TensorFlow ConcreteFunction objects. The concrete function objects to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., deducing from the input tensor names defined in the ConcreteFunction.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the ConcreteFunction.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the ConcreteFunction.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the ConcerteFunction are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowConverter object.

class methodfrom_keras_model(keras_modelinput_names=Noneinput_shapes=Noneoutput_names=Nonedefault_batch_size=None)

Create the TensorFlowConverter object from TensorFlow keras model.

Parameters:
  • keras_model – A tf.keras.Model object. The tf.keras model to be converted.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the keras model.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the keras model.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the keras model.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the keras model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowConverter object.

class methodfrom_keras_model_file(keras_model_fileinput_names=Noneinput_shapes=Noneoutput_names=Nonecustom_objects=Nonedefault_batch_size=None)

Create the TensorFlowConverter object from TensorFlow keras HDF5 model file.

Parameters:
  • keras_model_file – A str value. Path to the tf.keras HDF5 model file.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the keras model.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the keras model.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the keras model.

  • custom_objects – Dict mapping names (strings) to custom classes or functions to be considered during deserialization. Will be passed to the tf.keras.models.load_model function. Defaults to None.

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the keras model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowConverter object.

class methodfrom_saved_model_dir(saved_model_dirinput_names=Noneinput_shapes=Noneoutput_names=Nonetag_set=Nonesignature_keys=Nonedefault_batch_size=None)

Create the TensorFlowConverter object from TensorFlow SavedModel.

Parameters:
  • saved_model_dir – A str value. Path to the SavedModel directory.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the input tensor names defined in the SignatureDef.

  • input_shapes – A list of list of int values. The input tensor shapes. Defaults to None, i.e., the input tensor shapes defined in the SignatureDef.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., the output tensor names defined in the SignatureDef.

  • tag_set – Set of tags identifying the MetaGraphDef within the SavedModel to convert. Defaults to None, i.e., set(tf.saved_model.tag_constants.SERVING).

  • signature_keys – List of keys to identify the SignatureDef containing the inputs and outputs. Defaults to None, i.e., list(tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY).

  • default_batch_size – An int value. The batch size that will be used when the input shapes deduced from the SavedModel are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. Defaults to None, i.e., the input shapes will remain unchanged.

Returns:

TensorFlowConverter object.

get_available_options()

Get the available option names.

Returns:

A list of available option names.

propertyallow_4w8a_affine_operators

bool value. Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Defaults to True.

 
propertyallow_8w16a_affine_operators

bool value. Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Defaults to True.

 
propertyallow_different_affine_output_quantization_type

bool value. Whether to allow different quantization type for the output tensor of affine operators. Defaults to True.

 
propertyallow_dynamic_quantization

bool value. Whether to allow dynamic quantization for affine operators. Defaults to False.

 
propertyallow_incompatible_paddings_for_tflite_pooling

bool value. Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Defaults to False.

 
propertyallow_missing_quantization_ranges

bool value. Whether to allow missing min/max values of the tensors when quantizing the model. Defaults to False.

 
propertyappend_output_dequantize_ops

bool value. Whether to keep the model output as floating-point type. If TrueDequantize operators will be inserted at the end of the model to dequantize the output tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertycalibration_data_count

An int value. The number of calibration data (number of batches) to use when doing post-training quantization. If not set, use the number of elements in the iterator which calibration_data_gen returns. Defaults to None.

 
propertycalibration_data_gen

A function that returns an iterator for the stream of input data used to do post-training quantization. The input data can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy array. Defaults to None.

 
propertyconvert_unsupported_data_type

bool value. Whether to convert the unsupported data types to the supported ones. Note that this conversion is not equivalent and may result in quality drop in some cases. Defaults to True. This option cannot be changed for now.

 
propertydefault_weights_quantization_bitwidth

An int value. The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

 
propertyenable_12bit_data_types

bool value. Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Defaults to False.

 
propertyensure_safe_affine_output_quantization_scale

bool value. Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Defaults to False.

 
propertyensure_same_concat_quantization_params

bool value. Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Defaults to False.

 
propertyinput_quantization_bitwidths

An int value or a list of int values. The quantization bitwidths for the model input tensors. To skip the specific input tensors, users can set them as None. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

 
propertyinput_value_ranges

A list of two float values (e.g., [(min_0, max_0), …, (min_N, max_N)]). The value ranges (used for quantization) for the model input tensors. To skip the specific input tensors, users can replace the min/max tuple as None directly. Defaults to None, no value ranges are set to the model input tensors.

 
propertyprepend_input_quantize_ops

bool value. Whether to keep the model input as floating-point type. If TrueQuantize operators will be inserted at the beginning of the model to quantize the input tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertyquantize

bool value. Whether to quantize the model. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Defaults to True.

 
propertyuse_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Defaults to False.

 
propertyuse_unsigned_quantization_type

bool value. Whether to use unsigned data type for quantization if possible. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Defaults to True.

class mtk_converter.PyTorchConverter(script_moduleinput_shapesinput_types=None)

PyTorch Converter class.

Parameters:
  • script_module – The ScriptModule object to be converted.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • input_types – A list of torch.dtype or type strings. The input tensor types of the input ScriptModule model. Can be (1) PyTorch type: torch.float32torch.float64torch.int32torch.int64torch.bool or (2) type string: float32float64int32int64bool. Should have the same number as model input tensors. Defaults to None (i.e., all input tensors will have float32 type).

convert_to_mlir(output_file=None)

Convert the given model to MLIR model based on the provided options.

Parameters:

output_file – A str value. Path to export the output MLIR model file. Defaults to None.

Returns:

A MLIR model buffer as bytes object.

convert_to_tflite(output_file=Nonetflite_op_export_spec=None)

Convert the given model to TFLite model based on the provided options.

Parameters:
  • output_file – A str value. Path to export the output TFLite model file. Defaults to None.

  • tflite_op_export_spec – A str value. The specification of how the operators are exported to TFLite. Defaults to ‘builtin_first’.

Returns:

A TFLite model buffer as bytes object.

class methodfrom_script_module(script_moduleinput_shapesinput_types=None)

Create the PyTorchConverter from the ScriptModule file.

Parameters:
  • script_module – The ScriptModule object to be converted.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • input_types – A list of torch.dtype or type strings. The input tensor types of the input ScriptModule model. Can be (1) PyTorch type: torch.float32torch.float64torch.int32torch.int64torch.bool or (2) type string: float32float64int32int64bool. Should have the same number as model input tensors. Defaults to None (i.e., all input tensors will have float32 type).

Returns:

PyTorchConverter object.

class methodfrom_script_module_file(script_module_fileinput_shapesinput_types=None)

Create the PyTorchConverter from the ScriptModule file.

Parameters:
  • script_module_file – A str value. Path to the ScriptModule file to be converted.

  • input_shapes – A list of list of int values. The input tensor shapes.

  • input_types – A list of torch.dtype or type strings. The input tensor types of the input ScriptModule model. Can be (1) PyTorch type: torch.float32torch.float64torch.int32torch.int64torch.bool or (2) type string: float32float64int32int64bool. Should have the same number as model input tensors. Defaults to None (i.e., all input tensors will have float32 type).

Returns:

PyTorchConverter object.

get_available_options()

Get the available option names.

Returns:

A list of available option names.

propertyallow_4w8a_affine_operators

bool value. Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Defaults to True.

 
propertyallow_8w16a_affine_operators

bool value. Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Defaults to True.

 
propertyallow_different_affine_output_quantization_type

bool value. Whether to allow different quantization type for the output tensor of affine operators. Defaults to True.

 
propertyallow_dynamic_quantization

bool value. Whether to allow dynamic quantization for affine operators. Defaults to False.

 
propertyallow_incompatible_paddings_for_tflite_pooling

bool value. Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Defaults to False.

 
propertyallow_missing_quantization_ranges

bool value. Whether to allow missing min/max values of the tensors when quantizing the model. Defaults to False.

 
propertyappend_output_dequantize_ops

bool value. Whether to keep the model output as floating-point type. If TrueDequantize operators will be inserted at the end of the model to dequantize the output tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertycalibration_data_count

An int value. The number of calibration data (number of batches) to use when doing post-training quantization. If not set, use the number of elements in the iterator which calibration_data_gen returns. Defaults to None.

 
propertycalibration_data_gen

A function that returns an iterator for the stream of input data used to do post-training quantization. The input data can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy array. Defaults to None.

 
propertyconvert_unsupported_data_type

bool value. Whether to convert the unsupported data types to the supported ones. Note that this conversion is not equivalent and may result in quality drop in some cases. Defaults to True. This option cannot be changed for now.

 
propertydefault_weights_quantization_bitwidth

An int value. The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

 
propertyenable_12bit_data_types

bool value. Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Defaults to False.

 
propertyensure_safe_affine_output_quantization_scale

bool value. Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Defaults to False.

 
propertyensure_same_concat_quantization_params

bool value. Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Defaults to False.

 
propertyinput_quantization_bitwidths

An int value or a list of int values. The quantization bitwidths for the model input tensors. To skip the specific input tensors, users can set them as None. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

 
propertyinput_value_ranges

A list of two float values (e.g., [(min_0, max_0), …, (min_N, max_N)]). The value ranges (used for quantization) for the model input tensors. To skip the specific input tensors, users can replace the min/max tuple as None directly. Defaults to None, no value ranges are set to the model input tensors.

 
propertyprepend_input_quantize_ops

bool value. Whether to keep the model input as floating-point type. If TrueQuantize operators will be inserted at the beginning of the model to quantize the input tensors. Only take effect when converting a quantized model. Defaults to False.

 
propertyquantize

bool value. Whether to quantize the model. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Defaults to True.

 
propertyuse_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Defaults to False.

 
propertyuse_unsigned_quantization_type

bool value. Whether to use unsigned data type for quantization if possible. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Defaults to True.

class mtk_converter.MlirExecutor(filename_or_binarycustom_op_lib_map=Nonesimulate_fp16=False)

MLIR model executor class.

Parameters:
  • filename_or_binary – A str or bytes value. The MLIR model filename or the binary content.

  • custom_op_lib_map – A dictionary with string keys and string values. The mapping of custom operator op_type (key) and shared library name (value) of its implementation. Defaults to None, i.e. no custom operator implementation will be used.

  • simulate_fp16 – A bool value. Whether to simulate the FP16 computation result. Defaults to False.

run(input_data_list_or_dictoutput_list=Nonequantize_input=Falsedequantize_output=False)

Execute the model with given input data.

Parameters:
  • input_data_list_or_dict – Can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy arrays.

  • output_list – A list of str values. The output tensor names. Defaults to the output tensor names of the primary subgraph.

  • quantize_input – A bool value. Whether to quantize the input data when running a quantized model. Defaults to False.

  • dequantize_output – A bool value. Whether to dequantize the output data when running a quantized model. Defaults to False.

Returns:

A list of numpy arrays. The output data.

class mtk_converter.TFLiteExecutor(filename_or_binarycustom_op_lib_map=Nonesimulate_fp16=False)

TFLite model executor class.

Parameters:
  • filename_or_binary – A str or bytes value. The TFLite model filename or the binary content.

  • custom_op_lib_map – A dictionary with string keys and string values. The mapping of custom operator op_type (key) and shared library name (value) of its implementation. Defaults to None, i.e. no custom operator implementation will be used.

  • simulate_fp16 – A bool value. Whether to simulate the FP16 computation result. Defaults to False.

run(input_data_list_or_dictoutput_list=Nonequantize_input=Falsedequantize_output=False)

Execute the model with given input data.

Parameters:
  • input_data_list_or_dict – Can be (1) a list of input numpy arrays or (2) a dict that maps the input tensor name and corresponding numpy arrays.

  • output_list – A list of str values. The output tensor names. Defaults to the output tensor names of the primary subgraph.

  • quantize_input – A bool value. Whether to quantize the input data when running a quantized model. Defaults to False.

  • dequantize_output – A bool value. Whether to dequantize the output data when running a quantized model. Defaults to False.

Returns:

A list of numpy arrays. The output data.

class mtk_converter.TFLiteParser(tflite_filename)

TFLite model parser class.

Parameters:

tflite_filename – A str value. The TFLite model filename to be dissected.

as_dict()

Return the dict representation of the TFLite model.

get_constant_tensor_data(tensor_name_or_idxsubgraph_idx=Nonedequantize=None)

Get the detail of a specific tensor in a specific subgraph.

Parameters:
  • tensor_name_or_idx – An int value or a str value. The tensor index of tensor name of the tensor to query.

  • subgraph_idx – An int value. The index of the subgraph to be parsed. Defaults to 0, that is, the major subgraph of the TFLite model.

  • dequantize – A bool value. Whether to dequantize the constant data based on the quantization parameter of the given tensor. Take effect only when the given TFLite tensor is a quantized tensor.

Returns:

numpy.ndarray object. The tensor data.

get_input_tensor_details(subgraph_idx=None)

Get the input tensor details in a specific subgraph.

Parameters:

subgraph_idx – An int value. The index of the subgraph to be parsed. Defaults to 0, that is, the major subgraph of the TFLite model.

get_opcode_summary(subgraph_idx=Noneignore_op_version=None)

Summarize the operator codes in a specific subgraph.

This function summarizes the occurrence count of each operator type (and the corresponding version) in the TFLite subgraph.

Parameters:
  • subgraph_idx – An int value. The index of the subgraph to be parsed. Defaults to 0, that is, the major subgraph of the TFLite model.

  • ignore_op_version – An bool value. Whether to ignore the operator version when summarizing the operator codes. Defaults to False.

get_output_tensor_details(subgraph_idx=None)

Get the output tensor details in a specific subgraph.

Parameters:

subgraph_idx – An int value. The index of the subgraph to be parsed. Defaults to 0, that is, the major subgraph of the TFLite model.

get_tensor_detail(tensor_name_or_idxsubgraph_idx=None)

Get the detail of a specific tensor in a specific subgraph.

Parameters:
  • tensor_name_or_idx – An int value or a str value. The tensor index of tensor name of the tensor to query.

  • subgraph_idx – An int value. The index of the subgraph to be parsed. Defaults to 0, that is, the major subgraph of the TFLite model.

mtk_converter.dissect_tflite_model(tflite_filenameinput_dataexport_dircustom_op_lib_map=Nonequiet=None)

Dissect the given TFLite model layer-by-layer.

This function is useful when analyzing the mismatches between the PC execution result and the device execution result of the same TFLite model. By comparing the execution results layer-by-layer, we can easily identify if any of the layers induce unexpected errors, without bothering with the error propagation issue.

This function will creates many sub-directory in the given export_dir. Each sub-directory will contain the dissected TFLite model (with only a single operator) as well as the corresponding reference input/output data files.

Parameters:
  • tflite_filename – A str value. The TFLite model filename to be dissected.

  • input_data – A numpy.ndarray object or a list of numpy.ndarray objects. The input data that is used to produce the reference input/output data pairs for all the dissected models.

  • export_dir – A str value. The directory name where the dissected models are saved.

  • custom_op_lib_map – A dictionary with string keys and string values. The mapping of custom operator custom_code (key) and shared library name (value) of its implementation. Defaults to None, i.e. no custom operator implementation will be used.

  • quiet – A bool value. Where to disable the progress bar when dissecting the TFLite model. Defaults to False.

mtk_converter.export_tflite_model_from_dict(model_dictoutput_file=None)

Export the JSON-like dict object as a TFLite model.

This function first exports the provided model_dict object to JSON file, and uses the flatc flatbuffers compiler to produce the TFLite model.

Parameters:
  • model_dict – A dict object. The JSON-like representation of the TFLite model

  • output_file – A str value. Path to export the output TFLite model file. Defaults to None.

Returns:

A TFLite model buffer as bytes object.

mtk_converter.load_tflite_model_as_dict(filename)

Load the TFLite model as a JSON-like dict object via flatc flatbuffers compiler.

This function fixes the floating-point precision issues when deserialize the TFLite model to JSON format (please check https://github.com/google/flatbuffers/issues/5371 for more detail).

Parameters:

filename – A str value. The TFLite model filename.

Returns:

dict object. The JSON representation of the input TFLite model.

mtk_converter.plot_model_quant_error(converterinput_dataoutput_dirquiet=None)

Plot the quantization error between float/fake-quantized model and quantized model.

The float/fake-quantized model and the (hybrid-)quantized model are produced based on the provided converter object (by changing the quantize option only). Based on these two models, this function plots the following errors to help identify the potential solutions to enhance the resulting quality. Following quantization errors are included:

  • The cumulative maximum/average absolute differences between the output tensor of the corresponding sub-models (with different end-point tensors).

  • The layer-wise maximum/average absolute differences between the output tensor of each corresponding operators in these two models.

  • The layer-wise maximum/average absolute differences between the weight tensor of each corresponding affine operators in these two models.

For example, consider a model with four consecutive operators (namely ABC, and D), the cumulative one shows the quantization error of these four cases: AA + BA + B + C, and A + B + C + D, while the layer-wise one shows the quantization error of these four cases: ABC, and D.

The measured differences are expressed with two following metrics.

  • The number of quantization scales (i.e., absolute differences divided by the quantization scale of the specific tensor).

  • The minimum of the absolute/relative differences (i.e., min(absolute diff, relative diff)). The relative differences is the absolute differences divided by the maximum absolute values of the two sources values).

Other than the plot summarizing the quantization error, this function also exports the models and the corrsponding input/output data when doing the layer-by-layer analysis. The dissected models can be used to check the differences between the host result and the result computed on the MediaTek platforms.

Parameters:
  • converter – A BaseConverter object. The converter object used to produce the model. Users should set all the required options properly.

  • input_data – A numpy.ndarray object or a list of numpy.ndarray objects. The input data to do layer-wise comparison and produce the reference input/output data pairs for all the dissected models. The provided input data will be used to run the FakeQuantized model and therefore should not be quantized.

  • output_dir – A str value. The directory name where the comparison result and the dissected models are saved.

  • quiet – A bool value. Where to disable the progress bar when running the layer-wise comparison. Defaults to False.

class mtk_converter.tfv1.CustomOpAnnotator(op_typetarget_devicevendoroutput_shapesquantizable=Noneoutput_quant_ranges=Nonecustom_attributes=None)

A class that helps to build custom operators in TensorFlow V1 graph.

The CustomOpAnnotator class allows users to annotate a sub-graph (containing one or several operators) that should be treated as a custom operator when building the TensorFlow V1 model.

The usage is as following:

  1. Create the CustomOpAnnotator object and configure the settings related to the custom operator.

  2. Invoke annotate_inputs function to annotate the input tensors of the custom operator. Users should use the returned tensors to build the body of custom operator.

  3. Build the body of the custom operator. It may contain one or several TensorFlow operators.

  4. Invoke annotate_outputs function to annotate the output tensors of the custom operator. Users should use the returned tensors of annotate_outputs to build the rest of the TensorFlow model.

The CustomOpAnnotator object will build IdentityN operators as the input and output boundary of the custom operator (the boundary is obtained from annotate_inputs and annotate_outputs functions). Settings of the custom operator will be stored as attributes in these IdentityN operators.

WARNING: This is an experimental API and subject to changes.

Parameters:
  • op_type – A str value. The op_type attribute of the custom operator.

  • target_device – A str value. The target_device attribute of the custom operator.

  • vendor – A str value. The vendor attribute of the custom operator.

  • output_shapes – A str value or a list of list of int values. Desribe how to obtain the output tensor shapes of the custom operator. Support either the shape propagation policy name or a list of fixed tensor shapes. The tensor shape is expressed by a list of int values.

  • quantizable – A bool value. Whether the custom operator supports quantization or not. Defaults to False.

  • output_quant_ranges – A str value or a list of float values. Describe how to obtain the output tensor quantization ranges of the custom operator when the output tensors do not followed by FakeQuantize operators. Support either the quantization range propagation policy name or a list of two floating-point values expressed as [(output_1_min, output_1_max), …, (output_n_min, output_n_max)]. Defaults to None (i.e., no output tensor quantization ranges will be deduced).

  • custom_attributes – A list of (key, value) pair. The attributes of the custom operators. Currently, the following data types are supported: boolfloatintlist of float, and list of int. Defaults to None (i.e. no attributes for this custom operator).

annotate_inputs(tensorsname=None)

Annotate and wrap the input tensors of the custom operators.

Parameters:
  • tensors – A list of tf.Tensor objects. The input tensors of the custom operator.

  • name – A str value. The name of the boundary IdentityN operator. Defaults to None (i.e. use the default name provided by TensorFlow library).

Returns:

A list of tf.Tensor objects. The wrapped input tensors.

annotate_outputs(tensorsname=None)

Annotate and wrap the output tensors of the custom operators.

Parameters:
  • tensors – A list of tf.Tensor objects. The output tensors of the custom operator.

  • name – A str value. The name of the boundary IdentityN operator. This name will also be used for the custom operator in the TensorFlow model after the annotation is resolved. Defaults to None (i.e. use the default name provided by TensorFlow library).

Returns:

A list of tf.Tensor objects. The wrapped output tensors.

mtk_converter.tfv1.resolve_custom_op_annotations(graph_def)

Resolve the annotations in the TensorFlow model.

This function replaces each of the custom operator annotation (and their corresponding sub-graphs) into a single custom operator.

WARNING: This is an experimental API and subject to changes.

Parameters:

graph_def – A tf.GraphDef object. The input TensorFlow model with custom operator annotations.

Returns:

tf.GraphDef object. The resolved TensorFlow model.

mtk_converter.sysconfig.get_include_dir()

Get the directory that containing the C++ header files.

Returns:

str value. The directory name.

mtk_converter.sysconfig.get_lib_dir()

Get the directory that containing the C++ header files.

Returns:

str value. The directory name.

mtk_converter.sysconfig.get_compile_flags()

Get the compilation flags for building the implementations of custom operators.

Returns:

list of str values. The compilation flags.

Get the link flags for building the implementations of custom operators.

Returns:

list of str values. The link flags.

5.1.3. Executables

5.1.3.1. mtk_caffe_converter

 

Convert Caffe model

 

usage: mtk_caffe_converter [-h] --input_prototxt_file INPUT_PROTOTXT_FILE
                           --input_caffemodel_file INPUT_CAFFEMODEL_FILE
                           --output_file OUTPUT_FILE
                           [--output_file_format {tflite,mlir}]
                           [--tflite_op_export_spec {legacy,legacy_ignore_version,builtin_first,custom_first}]
                           [--input_names INPUT_NAMES]
                           [--input_shapes INPUT_SHAPES]
                           [--output_names OUTPUT_NAMES] [--quantize QUANTIZE]
                           [--input_quantization_bitwidths INPUT_QUANTIZATION_BITWIDTHS]
                           [--default_weights_quantization_bitwidth DEFAULT_WEIGHTS_QUANTIZATION_BITWIDTH]
                           [--use_symmetric_quantization USE_SYMMETRIC_QUANTIZATION]
                           [--use_weights_symmetric_quantization USE_WEIGHTS_SYMMETRIC_QUANTIZATION]
                           [--use_per_output_channel_quantization USE_PER_OUTPUT_CHANNEL_QUANTIZATION]
                           [--use_unsigned_quantization_type USE_UNSIGNED_QUANTIZATION_TYPE]
                           [--ensure_safe_affine_output_quantization_scale ENSURE_SAFE_AFFINE_OUTPUT_QUANTIZATION_SCALE]
                           [--allow_different_affine_output_quantization_type ALLOW_DIFFERENT_AFFINE_OUTPUT_QUANTIZATION_TYPE]
                           [--allow_4w8a_affine_operators ALLOW_4W8A_AFFINE_OPERATORS]
                           [--allow_8w16a_affine_operators ALLOW_8W16A_AFFINE_OPERATORS]
                           [--allow_dynamic_quantization ALLOW_DYNAMIC_QUANTIZATION]
                           [--allow_incompatible_paddings_for_tflite_pooling ALLOW_INCOMPATIBLE_PADDINGS_FOR_TFLITE_POOLING]
                           [--enable_12bit_data_types ENABLE_12BIT_DATA_TYPES]
                           [--ensure_same_concat_quantization_params ENSURE_SAME_CONCAT_QUANTIZATION_PARAMS]
                           [--prepend_input_quantize_ops PREPEND_INPUT_QUANTIZE_OPS]
                           [--append_output_dequantize_ops APPEND_OUTPUT_DEQUANTIZE_OPS]
                           [--input_value_ranges INPUT_VALUE_RANGES]
                           [--allow_missing_quantization_ranges ALLOW_MISSING_QUANTIZATION_RANGES]
                           [--calibration_data_dir CALIBRATION_DATA_DIR]
                           [--calibration_data_regexp CALIBRATION_DATA_REGEXP]
5.1.3.4.1. Named Arguments
--input_prototxt_file

Path to the Caffe prototxt file (.prototxt).

--input_caffemodel_file

Path to the Caffe model file (.caffemodel).

--output_file

Path to the output model file.

--output_file_format

Possible choices: tflite, mlir

The output file format.

--tflite_op_export_spec

Possible choices: legacy, legacy_ignore_version, builtin_first, custom_first

The specification of how the operators are exported to TFLite.

--input_names

Input tensor names (comma separated).

--input_shapes

Input shapes (colon separated, and the dimensions are comma separated).

--output_names

Output tensor names (comma separated).

--quantize

Whether to quantize the model. Should be True or False. Defaults to False.

--input_quantization_bitwidths

The quantization bitwidths for the model input tensors (colon separated). To skip the specific input tensors, users can leave them blank. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

--default_weights_quantization_bitwidth

The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

--use_symmetric_quantization

Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to False.

--use_weights_symmetric_quantization

Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_per_output_channel_quantization

Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_unsigned_quantization_type

Whether to use unsigned quantized data type if possible. Should be True or False. Defaults to False.

--ensure_safe_affine_output_quantization_scale

Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Should be True or False. Defaults to False.

--allow_different_affine_output_quantization_type

Whether to allow different quantization type for the output tensor of affine operators. Should be True or False. Defaults to True.

--allow_4w8a_affine_operators

Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Should be True or False. Defaults to True.

--allow_8w16a_affine_operators

Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Should be True or False. Defaults to True.

--allow_dynamic_quantization

Whether to allow dynamic quantiztaion for affine operators. Should be True or False. Defaults to False.

--allow_incompatible_paddings_for_tflite_pooling

Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Should be True or False. Defaults to False.

--enable_12bit_data_types

Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Should be True or False. Defaults to False.

--ensure_same_concat_quantization_params

Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Should be True or False. Defaults to False.

--prepend_input_quantize_ops

Whether to insert Quantize operators at the beginning of the model to quantize the input tensors when converting a quantized model. In this case, users could pass a floating-point input data when running inferences. Should be True or False. Defaults to False.

--append_output_dequantize_ops

Whether to insert Dequantize operators at the end of the model to dequantize the output tensors when converting a quantized model. In this case, users will get a floating-point output data when running inferences. Should be True or False. Defaults to False.

--input_value_ranges

The value ranges (used for quantization) for the model input tensors (colon separated). The value range is expressed with two comma separated floating-point values, that is, the minimum and maximuim values. To skip the specific input tensors, users can leave them blank. Defaults to None, no default value ranges are set to the model input tensors.

--allow_missing_quantization_ranges

Whether to allow missing min/max values of the tensors when quantizing the model. Should be True or False. Defaults to False.

--calibration_data_dir

Path to the directory containing the calibration data files used to do post-training quantization.

--calibration_data_regexp

Regular expression for the filename of the calibration data. Filenames should be with .npy or .npz extension.

5.1.3.2. mtk_tensorflow_v1_converter

 

Convert TensorFlow V1 model

 

usage: mtk_tensorflow_v1_converter [-h]
                                   (--input_frozen_graph_def_file INPUT_FROZEN_GRAPH_DEF_FILE | --input_saved_model_dir INPUT_SAVED_MODEL_DIR | --input_keras_model_file INPUT_KERAS_MODEL_FILE)
                                   --output_file OUTPUT_FILE
                                   [--output_file_format {tflite,mlir}]
                                   [--tflite_op_export_spec {legacy,legacy_ignore_version,builtin_first,custom_first}]
                                   [--default_batch_size DEFAULT_BATCH_SIZE]
                                   [--input_names INPUT_NAMES]
                                   [--input_shapes INPUT_SHAPES]
                                   [--output_names OUTPUT_NAMES]
                                   [--tag_set TAG_SET]
                                   [--signature_key SIGNATURE_KEY]
                                   [--quantize QUANTIZE]
                                   [--input_quantization_bitwidths INPUT_QUANTIZATION_BITWIDTHS]
                                   [--default_weights_quantization_bitwidth DEFAULT_WEIGHTS_QUANTIZATION_BITWIDTH]
                                   [--use_symmetric_quantization USE_SYMMETRIC_QUANTIZATION]
                                   [--use_weights_symmetric_quantization USE_WEIGHTS_SYMMETRIC_QUANTIZATION]
                                   [--use_per_output_channel_quantization USE_PER_OUTPUT_CHANNEL_QUANTIZATION]
                                   [--use_unsigned_quantization_type USE_UNSIGNED_QUANTIZATION_TYPE]
                                   [--ensure_safe_affine_output_quantization_scale ENSURE_SAFE_AFFINE_OUTPUT_QUANTIZATION_SCALE]
                                   [--allow_different_affine_output_quantization_type ALLOW_DIFFERENT_AFFINE_OUTPUT_QUANTIZATION_TYPE]
                                   [--allow_4w8a_affine_operators ALLOW_4W8A_AFFINE_OPERATORS]
                                   [--allow_8w16a_affine_operators ALLOW_8W16A_AFFINE_OPERATORS]
                                   [--allow_dynamic_quantization ALLOW_DYNAMIC_QUANTIZATION]
                                   [--allow_incompatible_paddings_for_tflite_pooling ALLOW_INCOMPATIBLE_PADDINGS_FOR_TFLITE_POOLING]
                                   [--enable_12bit_data_types ENABLE_12BIT_DATA_TYPES]
                                   [--ensure_same_concat_quantization_params ENSURE_SAME_CONCAT_QUANTIZATION_PARAMS]
                                   [--prepend_input_quantize_ops PREPEND_INPUT_QUANTIZE_OPS]
                                   [--append_output_dequantize_ops APPEND_OUTPUT_DEQUANTIZE_OPS]
                                   [--input_value_ranges INPUT_VALUE_RANGES]
                                   [--allow_missing_quantization_ranges ALLOW_MISSING_QUANTIZATION_RANGES]
                                   [--calibration_data_dir CALIBRATION_DATA_DIR]
                                   [--calibration_data_regexp CALIBRATION_DATA_REGEXP]
5.1.3.4.1. Named Arguments
--input_frozen_graph_def_file

Path to the GraphDef file to be converted.

--input_saved_model_dir

Path to the SavedModel directory to be converted.

--input_keras_model_file

Path to the keras HDF5 file to be converted.

--output_file

Path to the output model file.

--output_file_format

Possible choices: tflite, mlir

The output file format.

--tflite_op_export_spec

Possible choices: legacy, legacy_ignore_version, builtin_first, custom_first

The specification of how the operators are exported to TFLite.

--default_batch_size

The batch size that will be used when the input shapes deduced from the input model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. If not provided, the input shapes will remain unchanged.

--input_names

Input tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--input_shapes

Input shapes (colon separated, and the dimensions are comma separated).

--output_names

Output tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--tag_set

Set of tags (comma seperated) identifying the MetaGraphDef within the SavedModel to convert. Take effect only when –input_saved_model_dir is set. Defaults to serve (i.e., the tag for serving graph).

--signature_key

Key identifying SignatureDef containing inputs and outputs. Take effect only when –input_saved_model_dir is set. Defaults to serving_default (i.e., the default signature key for TensorFlow SavedModel).

--quantize

Whether to quantize the model. Should be True or False. Defaults to False.

--input_quantization_bitwidths

The quantization bitwidths for the model input tensors (colon separated). To skip the specific input tensors, users can leave them blank. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

--default_weights_quantization_bitwidth

The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

--use_symmetric_quantization

Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to False.

--use_weights_symmetric_quantization

Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_per_output_channel_quantization

Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_unsigned_quantization_type

Whether to use unsigned quantized data type if possible. Should be True or False. Defaults to False.

--ensure_safe_affine_output_quantization_scale

Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Should be True or False. Defaults to False.

--allow_different_affine_output_quantization_type

Whether to allow different quantization type for the output tensor of affine operators. Should be True or False. Defaults to True.

--allow_4w8a_affine_operators

Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Should be True or False. Defaults to True.

--allow_8w16a_affine_operators

Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Should be True or False. Defaults to True.

--allow_dynamic_quantization

Whether to allow dynamic quantiztaion for affine operators. Should be True or False. Defaults to False.

--allow_incompatible_paddings_for_tflite_pooling

Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Should be True or False. Defaults to False.

--enable_12bit_data_types

Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Should be True or False. Defaults to False.

--ensure_same_concat_quantization_params

Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Should be True or False. Defaults to False.

--prepend_input_quantize_ops

Whether to insert Quantize operators at the beginning of the model to quantize the input tensors when converting a quantized model. In this case, users could pass a floating-point input data when running inferences. Should be True or False. Defaults to False.

--append_output_dequantize_ops

Whether to insert Dequantize operators at the end of the model to dequantize the output tensors when converting a quantized model. In this case, users will get a floating-point output data when running inferences. Should be True or False. Defaults to False.

--input_value_ranges

The value ranges (used for quantization) for the model input tensors (colon separated). The value range is expressed with two comma separated floating-point values, that is, the minimum and maximuim values. To skip the specific input tensors, users can leave them blank. Defaults to None, no default value ranges are set to the model input tensors.

--allow_missing_quantization_ranges

Whether to allow missing min/max values of the tensors when quantizing the model. Should be True or False. Defaults to False.

--calibration_data_dir

Path to the directory containing the calibration data files used to do post-training quantization.

--calibration_data_regexp

Regular expression for the filename of the calibration data. Filenames should be with .npy or .npz extension.

5.1.3.3. mtk_tensorflow_converter

 

Convert TensorFlow V2 (and above) model

 

usage: mtk_tensorflow_converter [-h]
                                (--input_saved_model_dir INPUT_SAVED_MODEL_DIR | --input_keras_model_file INPUT_KERAS_MODEL_FILE)
                                --output_file OUTPUT_FILE
                                [--output_file_format {tflite,mlir}]
                                [--tflite_op_export_spec {legacy,legacy_ignore_version,builtin_first,custom_first}]
                                [--default_batch_size DEFAULT_BATCH_SIZE]
                                [--input_names INPUT_NAMES]
                                [--input_shapes INPUT_SHAPES]
                                [--output_names OUTPUT_NAMES]
                                [--tag_set TAG_SET]
                                [--signature_keys SIGNATURE_KEYS]
                                [--quantize QUANTIZE]
                                [--input_quantization_bitwidths INPUT_QUANTIZATION_BITWIDTHS]
                                [--default_weights_quantization_bitwidth DEFAULT_WEIGHTS_QUANTIZATION_BITWIDTH]
                                [--use_symmetric_quantization USE_SYMMETRIC_QUANTIZATION]
                                [--use_weights_symmetric_quantization USE_WEIGHTS_SYMMETRIC_QUANTIZATION]
                                [--use_per_output_channel_quantization USE_PER_OUTPUT_CHANNEL_QUANTIZATION]
                                [--use_unsigned_quantization_type USE_UNSIGNED_QUANTIZATION_TYPE]
                                [--ensure_safe_affine_output_quantization_scale ENSURE_SAFE_AFFINE_OUTPUT_QUANTIZATION_SCALE]
                                [--allow_different_affine_output_quantization_type ALLOW_DIFFERENT_AFFINE_OUTPUT_QUANTIZATION_TYPE]
                                [--allow_4w8a_affine_operators ALLOW_4W8A_AFFINE_OPERATORS]
                                [--allow_8w16a_affine_operators ALLOW_8W16A_AFFINE_OPERATORS]
                                [--allow_dynamic_quantization ALLOW_DYNAMIC_QUANTIZATION]
                                [--allow_incompatible_paddings_for_tflite_pooling ALLOW_INCOMPATIBLE_PADDINGS_FOR_TFLITE_POOLING]
                                [--enable_12bit_data_types ENABLE_12BIT_DATA_TYPES]
                                [--ensure_same_concat_quantization_params ENSURE_SAME_CONCAT_QUANTIZATION_PARAMS]
                                [--prepend_input_quantize_ops PREPEND_INPUT_QUANTIZE_OPS]
                                [--append_output_dequantize_ops APPEND_OUTPUT_DEQUANTIZE_OPS]
                                [--input_value_ranges INPUT_VALUE_RANGES]
                                [--allow_missing_quantization_ranges ALLOW_MISSING_QUANTIZATION_RANGES]
                                [--calibration_data_dir CALIBRATION_DATA_DIR]
                                [--calibration_data_regexp CALIBRATION_DATA_REGEXP]
5.1.3.4.1. Named Arguments
--input_saved_model_dir

Path to the SavedModel directory to be converted.

--input_keras_model_file

Path to the keras HDF5 file to be converted.

--output_file

Path to the output model file.

--output_file_format

Possible choices: tflite, mlir

The output file format.

--tflite_op_export_spec

Possible choices: legacy, legacy_ignore_version, builtin_first, custom_first

The specification of how the operators are exported to TFLite.

--default_batch_size

The batch size that will be used when the input shapes deduced from the input model are with dynamic batch sizes. Take effect only when the input_shape argument is not provided. If not provided, the input shapes will remain unchanged.

--input_names

Input tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--input_shapes

Input shapes (colon separated, and the dimensions are comma separated).

--output_names

Output tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--tag_set

Set of tags (comma seperated) identifying the MetaGraphDef within the SavedModel to convert. Take effect only when –input_saved_model_dir is set. Defaults to serve (i.e., the tag for serving graph).

--signature_keys

Keys (comma separated) identifying the SignatureDef containing inputs and outputs. Take effect only when –input_saved_model_dir is set. Defaults to serving_default (i.e., the default signature key for TensorFlow SavedModel).

--quantize

Whether to quantize the model. Should be True or False. Defaults to False.

--input_quantization_bitwidths

The quantization bitwidths for the model input tensors (colon separated). To skip the specific input tensors, users can leave them blank. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

--default_weights_quantization_bitwidth

The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

--use_symmetric_quantization

Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to False.

--use_weights_symmetric_quantization

Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_per_output_channel_quantization

Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_unsigned_quantization_type

Whether to use unsigned quantized data type if possible. Should be True or False. Defaults to False.

--ensure_safe_affine_output_quantization_scale

Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Should be True or False. Defaults to False.

--allow_different_affine_output_quantization_type

Whether to allow different quantization type for the output tensor of affine operators. Should be True or False. Defaults to True.

--allow_4w8a_affine_operators

Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Should be True or False. Defaults to True.

--allow_8w16a_affine_operators

Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Should be True or False. Defaults to True.

--allow_dynamic_quantization

Whether to allow dynamic quantiztaion for affine operators. Should be True or False. Defaults to False.

--allow_incompatible_paddings_for_tflite_pooling

Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Should be True or False. Defaults to False.

--enable_12bit_data_types

Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Should be True or False. Defaults to False.

--ensure_same_concat_quantization_params

Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Should be True or False. Defaults to False.

--prepend_input_quantize_ops

Whether to insert Quantize operators at the beginning of the model to quantize the input tensors when converting a quantized model. In this case, users could pass a floating-point input data when running inferences. Should be True or False. Defaults to False.

--append_output_dequantize_ops

Whether to insert Dequantize operators at the end of the model to dequantize the output tensors when converting a quantized model. In this case, users will get a floating-point output data when running inferences. Should be True or False. Defaults to False.

--input_value_ranges

The value ranges (used for quantization) for the model input tensors (colon separated). The value range is expressed with two comma separated floating-point values, that is, the minimum and maximuim values. To skip the specific input tensors, users can leave them blank. Defaults to None, no default value ranges are set to the model input tensors.

--allow_missing_quantization_ranges

Whether to allow missing min/max values of the tensors when quantizing the model. Should be True or False. Defaults to False.

--calibration_data_dir

Path to the directory containing the calibration data files used to do post-training quantization.

--calibration_data_regexp

Regular expression for the filename of the calibration data. Filenames should be with .npy or .npz extension.

5.1.3.4. mtk_pytorch_converter

 

PyTorch model converter

 

usage: mtk_pytorch_converter [-h] --input_script_module_file
                             INPUT_SCRIPT_MODULE_FILE --output_file
                             OUTPUT_FILE [--output_file_format {tflite,mlir}]
                             [--tflite_op_export_spec {legacy,legacy_ignore_version,builtin_first,custom_first}]
                             --input_shapes INPUT_SHAPES
                             [--input_types INPUT_TYPES] [--quantize QUANTIZE]
                             [--input_quantization_bitwidths INPUT_QUANTIZATION_BITWIDTHS]
                             [--default_weights_quantization_bitwidth DEFAULT_WEIGHTS_QUANTIZATION_BITWIDTH]
                             [--use_symmetric_quantization USE_SYMMETRIC_QUANTIZATION]
                             [--use_weights_symmetric_quantization USE_WEIGHTS_SYMMETRIC_QUANTIZATION]
                             [--use_per_output_channel_quantization USE_PER_OUTPUT_CHANNEL_QUANTIZATION]
                             [--use_unsigned_quantization_type USE_UNSIGNED_QUANTIZATION_TYPE]
                             [--ensure_safe_affine_output_quantization_scale ENSURE_SAFE_AFFINE_OUTPUT_QUANTIZATION_SCALE]
                             [--allow_different_affine_output_quantization_type ALLOW_DIFFERENT_AFFINE_OUTPUT_QUANTIZATION_TYPE]
                             [--allow_4w8a_affine_operators ALLOW_4W8A_AFFINE_OPERATORS]
                             [--allow_8w16a_affine_operators ALLOW_8W16A_AFFINE_OPERATORS]
                             [--allow_dynamic_quantization ALLOW_DYNAMIC_QUANTIZATION]
                             [--allow_incompatible_paddings_for_tflite_pooling ALLOW_INCOMPATIBLE_PADDINGS_FOR_TFLITE_POOLING]
                             [--enable_12bit_data_types ENABLE_12BIT_DATA_TYPES]
                             [--ensure_same_concat_quantization_params ENSURE_SAME_CONCAT_QUANTIZATION_PARAMS]
                             [--prepend_input_quantize_ops PREPEND_INPUT_QUANTIZE_OPS]
                             [--append_output_dequantize_ops APPEND_OUTPUT_DEQUANTIZE_OPS]
                             [--input_value_ranges INPUT_VALUE_RANGES]
                             [--allow_missing_quantization_ranges ALLOW_MISSING_QUANTIZATION_RANGES]
                             [--calibration_data_dir CALIBRATION_DATA_DIR]
                             [--calibration_data_regexp CALIBRATION_DATA_REGEXP]
5.1.3.4.1. Named Arguments
--input_script_module_file

The PyTorch ScriptModule file to be converted.

--output_file

Path to the output model file.

--output_file_format

Possible choices: tflite, mlir

The output file format.

--tflite_op_export_spec

Possible choices: legacy, legacy_ignore_version, builtin_first, custom_first

The specification of how the operators are exported to TFLite.

--input_shapes

Input tensor shapes of the output model (colon separated, and the dimensions are comma separated). Should have same number as model input tensors.

--input_types

Input tensor types of the input ScriptModule model (comma separated). Can be float32, float64, int32, int64 or bool. Should have same number as model inputtensors. Defaults to None (i.e. all inputs will have float32 type).

--quantize

Whether to quantize the model. Should be True or False. Defaults to False.

--input_quantization_bitwidths

The quantization bitwidths for the model input tensors (colon separated). To skip the specific input tensors, users can leave them blank. If only a single bitwidth setting is provided, it will be applied to all the model input tensors. Defaults to 8.

--default_weights_quantization_bitwidth

The default quantization bitwidth for the weights of affine operators. Take effect when there exist no corresponding FakeQuantize operators. If not set, the default weight quantization bitwidth will be set as the input tensor of each corresponding affine operators. Defaults to None.

--use_symmetric_quantization

Whether to apply symmetric quantization to all the tensors except the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to False.

--use_weights_symmetric_quantization

Whether to apply symmetric quantization to all the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_per_output_channel_quantization

Whether to apply per-channel quantization for the weights of affine operators when doing post-training quantization. Should be True or False. Defaults to True.

--use_unsigned_quantization_type

Whether to use unsigned quantized data type if possible. Should be True or False. Defaults to False.

--ensure_safe_affine_output_quantization_scale

Whether to ensure safe quantization scale for affine operators. That is, to guarantee that the output scale will be larger than the product of input scale and weight scale. Should be True or False. Defaults to False.

--allow_different_affine_output_quantization_type

Whether to allow different quantization type for the output tensor of affine operators. Should be True or False. Defaults to True.

--allow_4w8a_affine_operators

Whether to allow affine operators with 8-bit input tensors and 4-bit weight tensors. Should be True or False. Defaults to True.

--allow_8w16a_affine_operators

Whether to allow affine operators with 16-bit input tensors and 8-bit weight tensors. Should be True or False. Defaults to True.

--allow_dynamic_quantization

Whether to allow dynamic quantiztaion for affine operators. Should be True or False. Defaults to False.

--allow_incompatible_paddings_for_tflite_pooling

Whether to allow the pooling operators whose padding settings are not compatible to TFLite padding type. If True, these invalid padding settings will be replaced with preceding Pad operators. Note that the model execution result might be different after the replacement in some cases. Should be True or False. Defaults to False.

--enable_12bit_data_types

Whether to enable 12-bit data types, such as INT12 and UINT12, in the model. If False, tensors with 12-bit quantization bitwidth will be expanded to larger bitwidth (i.e., 16-bit). Should be True or False. Defaults to False.

--ensure_same_concat_quantization_params

Whether to unify the quantization parameters (scale and zero_point) of the input and output tensors of the Concat operators. Should be True or False. Defaults to False.

--prepend_input_quantize_ops

Whether to insert Quantize operators at the beginning of the model to quantize the input tensors when converting a quantized model. In this case, users could pass a floating-point input data when running inferences. Should be True or False. Defaults to False.

--append_output_dequantize_ops

Whether to insert Dequantize operators at the end of the model to dequantize the output tensors when converting a quantized model. In this case, users will get a floating-point output data when running inferences. Should be True or False. Defaults to False.

--input_value_ranges

The value ranges (used for quantization) for the model input tensors (colon separated). The value range is expressed with two comma separated floating-point values, that is, the minimum and maximuim values. To skip the specific input tensors, users can leave them blank. Defaults to None, no default value ranges are set to the model input tensors.

--allow_missing_quantization_ranges

Whether to allow missing min/max values of the tensors when quantizing the model. Should be True or False. Defaults to False.

--calibration_data_dir

Path to the directory containing the calibration data files used to do post-training quantization.

--calibration_data_regexp

Regular expression for the filename of the calibration data. Filenames should be with .npy or .npz extension.

 

5.2. Java Interpreter API Reference

5.2.1. Interpreter.java

class Interpreter : public AutoCloseable

Driver class to drive model inference with TensorFlow Lite.

An Interpreter encapsulates a pre-trained TensorFlow Lite model, in which operations are executed for model inference.

For example, if a model takes only one input and returns only one output:

 

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

 

If a model takes multiple inputs or outputs:

 

Object[] inputs = {input0, input1, ...};
Map<Integer, Object> map_of_indices_to_outputs = new HashMap<>();
FloatBuffer ith_output = FloatBuffer.allocateDirect(3 * 2 * 4);  // Float tensor, shape 3x2x4.
ith_output.order(ByteOrder.nativeOrder());
map_of_indices_to_outputs.put(i, ith_output);
try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);
}

 

If a model takes or produces string tensors:

 

String[] input = {"foo", "bar"};  // Input tensor shape is [2].
String[] output = new String[3][2];  // Output tensor shape is [3, 2].
try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.runForMultipleInputsOutputs(input, output);
}

 

Orders of inputs and outputs are determined when converting TensorFlow model to TensorFlowLite model with Toco, as are the default shapes of the inputs.

When inputs are provided as (multi-dimensional) arrays, the corresponding input tensor(s) will be implicitly resized according to that array’s shape. When inputs are provided as java.nio.Buffer types, no implicit resizing is done; the caller must ensure that the java.nio.Buffer byte size either matches that of the corresponding tensor, or that they first resize the tensor via resizeInput(int, @NonNull int[]). Tensor shape and type information can be obtained via the Tensor class, available via getInputTensor(int) and getOutputTensor(int).

WARNING: Instances of an Interpreter are not thread-safe. An Interpreter owns resources that must be explicitly freed by invoking close().

The TFLite library is built against NDK API 19. It may work for Android API levels below 19, but is not guaranteed.

Note: This class is not thread safe.

Public Functions

inline  Interpreter (@NonNull File modelFile)

Initializes an Interpreter

Parameters:

modelFile – a File of a pre-trained TF Lite model.

Throws:

IllegalArgumentException – if modelFile does not encode a valid TensorFlow Lite model.

inline  Interpreter (@NonNull File modelFile, int numThreads)

Initializes an Interpreter and specifies the number of threads used for inference.

 

Deprecated:

Prefer using the Interpreter(File,Options) constructor. This method will be removed in a future release.

 

Parameters:
  • modelFile – a file of a pre-trained TF Lite model

  • numThreads – number of threads to use for inference

inline  Interpreter (@NonNull File modelFile, Options options)

Initializes an Interpreter and specifies the number of threads used for inference.

Parameters:
  • modelFile – a file of a pre-trained TF Lite model

  • options – a set of options for customizing interpreter behavior

Throws:

IllegalArgumentException – if modelFile does not encode a valid TensorFlow Lite model.

inline  Interpreter (@NonNull ByteBuffer byteBuffer)

Initializes an Interpreter with a ByteBuffer of a model file.

The ByteBuffer should not be modified after the construction of an Interpreter. The ByteBuffer can be either a MappedByteBuffer that memory-maps a model file, or a direct ByteBuffer of nativeOrder() that contains the bytes content of a model.

Throws:

IllegalArgumentException – if byteBuffer is not a MappedByteBuffer nor a direct Bytebuffer of nativeOrder.

inline  Interpreter (@NonNull ByteBuffer byteBuffer, int numThreads)

Initializes an Interpreter with a ByteBuffer of a model file and specifies the number of threads used for inference.

The ByteBuffer should not be modified after the construction of an Interpreter. The ByteBuffer can be either a MappedByteBuffer that memory-maps a model file, or a direct ByteBuffer of nativeOrder() that contains the bytes content of a model.

 

Deprecated:

Prefer using the Interpreter(ByteBuffer,Options) constructor. This method will be removed in a future release.

 

inline  Interpreter (@NonNull MappedByteBuffer mappedByteBuffer)

Initializes an Interpreterwith a MappedByteBuffer to the model file.

The MappedByteBuffer should remain unchanged after the construction of an Interpreter.

 

Deprecated:

Prefer using the Interpreter(ByteBuffer,Options) constructor. This method will be removed in a future release.

 

inline  Interpreter (@NonNull ByteBuffer byteBuffer, Options options)

Initializes an Interpreter with a ByteBuffer of a model file and a set of custom Interpreter.Options.

The ByteBuffer should not be modified after the construction of an Interpreter. The ByteBuffer can be either a MappedByteBuffer that memory-maps a model file, or a direct ByteBuffer of nativeOrder() that contains the bytes content of a model.

Throws:

IllegalArgumentException – if byteBuffer is not a MappedByteBuffer nor a direct Bytebuffer of nativeOrder.

inline void run(Object input, Object output)

Runs model inference if the model takes only one input, and provides only one output.

Warning: The API is more efficient if a java.nio.Buffer (preferably direct, but not required) is used as the input/output data type. Please consider using java.nio.Buffer to feed and fetch primitive data for better performance. The following concrete java.nio.Buffer types are supported:

 

  • ByteBuffer - compatible with any underlying primitive Tensor type.

  • java.nio.FloatBuffer - compatible with float Tensors.

  • java.nio.IntBuffer - compatible with int32 Tensors.

  • java.nio.LongBuffer - compatible with int64 Tensors.

 

Note that boolean types are only supported as arrays, not java.nio.Buffers, or as scalar inputs.

 

See also

Interpreter.Options#setAllowBufferHandleOutput(boolean).

 

Parameters:
  • input – an array or multidimensional array, or a java.nio.Buffer of primitive types including int, float, long, and byte. java.nio.Buffer is the preferred way to pass large input data for primitive types, whereas string types require using the (multi-dimensional) array input path. When a java.nio.Buffer is used, its content should remain unchanged until model inference is done, and the caller must ensure that the java.nio.Buffer is at the appropriate read position. A null value is allowed only if the caller is using a Delegate that allows buffer handle interop, and such a buffer has been bound to the input Tensor.

  • output – a multidimensional array of output data, or a java.nio.Buffer of primitive types including int, float, long, and byte. When a java.nio.Buffer is used, the caller must ensure that it is set the appropriate write position. A null value is allowed only if the caller is using a Delegate that allows buffer handle interop, and such a buffer has been bound to the output Tensor.

Throws:
  • IllegalArgumentException – if input or output is null or empty, or if error occurs when running the inference.

  • IllegalArgumentException – (EXPERIMENTAL, subject to change) if the inference is interrupted by setCancelled(true).

inline void runForMultipleInputsOutputs ( @NonNull Object[] inputs, @NonNull Map< Integer, Object > outputs)

Runs model inference if the model takes multiple inputs, or returns multiple outputs.

Warning: The API is more efficient if java.nio.Buffers (preferably direct, but not required) are used as the input/output data types. Please consider using java.nio.Buffer to feed and fetch primitive data for better performance. The following concrete java.nio.Buffer types are supported:

 

  • ByteBuffer - compatible with any underlying primitive Tensor type.

  • java.nio.FloatBuffer - compatible with float Tensors.

  • java.nio.IntBuffer - compatible with int32 Tensors.

  • java.nio.LongBuffer - compatible with int64 Tensors.

 

Note that boolean types are only supported as arrays, not java.nio.Buffers, or as scalar inputs.

Note: null values for invididual elements of inputs and outputs is allowed only if the caller is using a Delegate that allows buffer handle interop, and such a buffer has been bound to the corresponding input or output Tensor(s).

Parameters:
  • inputs – an array of input data. The inputs should be in the same order as inputs of the model. Each input can be an array or multidimensional array, or a java.nio.Buffer of primitive types including int, float, long, and byte. java.nio.Buffer is the preferred way to pass large input data, whereas string types require using the (multi-dimensional) array input path. When java.nio.Buffer is used, its content should remain unchanged until model inference is done, and the caller must ensure that the java.nio.Buffer is at the appropriate read position.

  • outputs – a map mapping output indices to multidimensional arrays of output data or java.nio.Buffers of primitive types including int, float, long, and byte. It only needs to keep entries for the outputs to be used. When a java.nio.Buffer is used, the caller must ensure that it is set the appropriate write position.

Throws:

IllegalArgumentException – if inputs or outputs is null or empty, or if error occurs when running the inference.

inline void runSignature ( @NonNull Map< String, Object > inputs, @NonNull Map< String, Object > outputs, String methodName)

Runs model inference based on SignatureDef provided through methodName.

See Interpreter#run(Object, Object) for more details on the allowed input and output data types.

WARNING: This is an experimental API and subject to change.

Parameters:
  • inputs – A Map of inputs from input name in the signatureDef to an input object.

  • outputs – a map mapping from output name in SignatureDef to output data.

  • methodName – The exported method name identifying the SignatureDef.

Throws:

IllegalArgumentException – if inputs or outputs or methodName is null or empty, or if error occurs when running the inference.

inline void runSignature ( @NonNull Map< String, Object > inputs, @NonNull Map< String, Object > outputs)
 
inline void allocateTensors()

Expicitly updates allocations for all tensors, if necessary.

This will propagate shapes and memory allocations for all dependent tensors using the input tensor shape(s) as given.

Note: This call is purely optional. Tensor allocation will occur automatically during execution if any input tensors have been resized. This call is most useful in determining the shapes for any output tensors before executing the graph, e.g.,

 

interpreter.resizeInput(0, new int[]{1, 4, 4, 3});
interpreter.allocateTensors();
FloatBuffer input = FloatBuffer.allocate(interpreter.getInputTensor(0),numElements());
// Populate inputs...
FloatBuffer output = FloatBuffer.allocate(interpreter.getOutputTensor(0).numElements());
interpreter.run(input, output)
// Process outputs...

 

Throws:

IllegalStateException – if the graph’s tensors could not be successfully allocated.

inline void resizeInput (int idx, @NonNull int[] dims)

Resizes idx-th input of the native model to the given dims.

Throws:

IllegalArgumentException – if idx is negtive or is not smaller than the number of model inputs; or if error occurs when resizing the idx-th input.

inline void resizeInput (int idx, @NonNull int[] dims, boolean strict)

Resizes idx-th input of the native model to the given dims.

When strict is True, only unknown dimensions can be resized. Unknown dimensions are indicated as -1 in the array returned by Tensor.shapeSignature().

Throws:

IllegalArgumentException – if idx is negtive or is not smaller than the number of model inputs; or if error occurs when resizing the idx-th input. Additionally, the error occurs when attempting to resize a tensor with fixed dimensions when struct is True.

inline int getInputTensorCount()

Gets the number of input tensors.

inline int getInputIndex(String opName)

Gets index of an input given the op name of the input.

Throws:

IllegalArgumentException – if opName does not match any input in the model used to initialize the Interpreter.

inline Tensor getInputTensor(int inputIndex)

Gets the Tensor associated with the provdied input index.

Throws:

IllegalArgumentException – if inputIndex is negtive or is not smaller than the number of model inputs.

inline Tensor getInputTensorFromSignature(String inputName, String methodName)

Gets the Tensor associated with the provdied input name and signature method name.

WARNING: This is an experimental API and subject to change.

Parameters:
  • inputName – Input name in the signature.

  • methodName – The exported method name identifying the SignatureDef, can be null if the model has one signature.

Throws:

IllegalArgumentException – if inputName or methodName is null or empty, or invalid name provided.

inline String[] getSignatureDefNames ()

Gets the list of SignatureDef exported method names available in the model.

WARNING: This is an experimental API and subject to change.

inline String[] getSignatureInputs (String methodName)

Gets the list of SignatureDefs inputs for method methodName

WARNING: This is an experimental API and subject to change.

inline String[] getSignatureOutputs (String methodName)

Gets the list of SignatureDefs outputs for method methodName

WARNING: This is an experimental API and subject to change.

inline int getOutputTensorCount()

Gets the number of output Tensors.

inline int getOutputIndex(String opName)

Gets index of an output given the op name of the output.

Throws:

IllegalArgumentException – if opName does not match any output in the model used to initialize the Interpreter.

inline Tensor getOutputTensor(int outputIndex)

Gets the Tensor associated with the provdied output index.

Note: Output tensor details (e.g., shape) may not be fully populated until after inference is executed. If you need updated details before running inference (e.g., after resizing an input tensor, which may invalidate output tensor shapes), use allocateTensors(). to explicitly trigger allocation and shape propagation. Note that, for graphs with output shapes that are dependent on input values, the output shape may not be fully determined until running inference.

Throws:

IllegalArgumentException – if outputIndex is negtive or is not smaller than the number of model outputs.

inline Tensor getOutputTensorFromSignature(String outputName, String methodName)

Gets the Tensor associated with the provdied output name in specifc signature method.

Note: Output tensor details (e.g., shape) may not be fully populated until after inference is executed. If you need updated details before running inference (e.g., after resizing an input tensor, which may invalidate output tensor shapes), use allocateTensors() to explicitly trigger allocation and shape propagation. Note that, for graphs with output shapes that are dependent on input values, the output shape may not be fully determined until running inference.

WARNING: This is an experimental API and subject to change.

Parameters:
  • outputName – Output name in the signature.

  • methodName – The exported method name identifying the SignatureDef, can be null if the model has one signature.

Throws:

IllegalArgumentException – if outputName or methodName is null or empty, or invalid name provided.

inline Long getLastNativeInferenceDurationNanoseconds()

Returns native inference timing.

Throws:

IllegalArgumentException – if the model is not initialized by the Interpreter.

inline void setNumThreads(int numThreads)

Sets the number of threads to be used for ops that support multi-threading.

 

Deprecated:

Prefer using Interpreter.Options#setNumThreads(int) directly for controlling thread multi-threading. This method will be removed in a future release.

 

inline void modifyGraphWithDelegate(Delegate delegate)

Advanced: Modifies the graph with the provided Delegate.

 

Deprecated:

Prefer using Interpreter.Options#addDelegate to provide delegates at creation time. This method will be removed in a future release.

 

Throws:

IllegalArgumentException – if error occurs when modifying graph with delegate.

inline void resetVariableTensors()

Advanced: Resets all variable tensors to the default value.

If a variable tensor doesn’t have an associated buffer, it will be reset to zero.

WARNING: This is an experimental API and subject to change.

inline void setCancelled(boolean cancelled)

Advanced: Interrupts inference in the middle of a call to Interpreter#run.

A cancellation flag will be set to true when this function gets called. The interpreter will check the flag between Op invocations, and if it’s true, the interpreter will stop execution. The interpreter will remain a cancelled state until explicitly “uncancelled” by setCancelled(false).

WARNING: This is an experimental API and subject to change.

 

See also

Interpreter.Options#setCancellable(boolean).

 

Parameters:

cancelled – true to cancel inference in a best-effort way; false to resume.

Throws:

IllegalStateException – if the interpreter is not initialized with the cancellable option, which is by default off.

inline void close()

Release resources associated with the Interpreter.

Protected Functions

inline void finalize()
 

Package Functions

inline int getExecutionPlanLength()
 

Package Attributes

NativeInterpreterWrapper wrapper
 
String[] signatureNameList
 

Private Functions

inline void checkNotClosed()
 
class Options

An options class for controlling runtime interpreter behavior.

Public Functions

inline Options()
 
inline Options setNumThreads(int numThreads)

Sets the number of threads to be used for ops that support multi-threading. Defaults to a platform-dependent value.

inline Options setUseNNAPI(boolean useNNAPI)

Sets whether to use NN API (if available) for op execution. Defaults to false (disabled).

inline Options setAllowFp16PrecisionForFp32(boolean allow)

Sets whether to allow float16 precision for FP32 calculation when possible. Defaults to false (disallow).

 

Deprecated:

Prefer using org.tensorflow.lite.nnapi.NnApiDelegate.Options#setAllowFp16(boolean enable)

 

inline Options addDelegate(Delegate delegate)

Adds a Delegate to be applied during interpreter creation.

WARNING: This is an experimental interface that is subject to change.

inline Options setAllowBufferHandleOutput(boolean allow)

Advanced: Set if buffer handle output is allowed.

When a Delegate supports hardware acceleration, the interpreter will make the data of output tensors available in the CPU-allocated tensor buffers by default. If the client can consume the buffer handle directly (e.g. reading output from OpenGL texture), it can set this flag to false, avoiding the copy of data to the CPU buffer. The delegate documentation should indicate whether this is supported and how it can be used.

WARNING: This is an experimental interface that is subject to change.

inline Options setCancellable(boolean allow)

Advanced: Set if the interpreter is able to be cancelled.

 

See also

Interpreter#setCancelled(boolean).

 

inline Options setUseXNNPACK(boolean useXNNPACK)

Experimental: Enable an optimized set of floating point CPU kernels (provided by XNNPACK).

Enabling this flag will enable use of a new, highly optimized set of CPU kernels provided via the XNNPACK delegate. Currently, this is restricted to a subset of floating point operations. Eventually, we plan to enable this by default, as it can provide significant peformance benefits for many classes of floating point models. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md for more details.

Things to keep in mind when enabling this flag:

 

  • Startup time and resize time may increase.

  • Baseline memory consumption may increase.

  • May be ignored if another delegate (eg NNAPI) have been applied.

  • Quantized models will not see any benefit.

 

WARNING: This is an experimental interface that is subject to change.

Package Attributes

int numThreads = -1
 
Boolean useNNAPI
 
Boolean allowFp16PrecisionForFp32
 
Boolean allowBufferHandleOutput
 
Boolean allowCancellation
 
Boolean useXNNPACK
 
final List< Delegate > delegates   = new ArrayList<>()
 
namespace com
 
namespace mediatek
 
namespace neuropilot_S
 
file Interpreter.java
 
page deprecated

 

Member com.mediatek.neuropilot_S.Interpreter.Interpreter  (@NonNull File modelFile, int numThreads)

Prefer using the Interpreter(File,Options) constructor. This method will be removed in a future release.

Member com.mediatek.neuropilot_S.Interpreter.Interpreter (@NonNull ByteBuffer byteBuffer, int numThreads)

Prefer using the Interpreter(ByteBuffer,Options) constructor. This method will be removed in a future release.

Member com.mediatek.neuropilot_S.Interpreter.Interpreter  (@NonNull MappedByteBuffer mappedByteBuffer)

Prefer using the Interpreter(ByteBuffer,Options) constructor. This method will be removed in a future release.

Member com.mediatek.neuropilot_S.Interpreter.modifyGraphWithDelegate  (Delegate delegate)

Prefer using Interpreter.Options#addDelegate to provide delegates at creation time. This method will be removed in a future release.

Member com.mediatek.neuropilot_S.Interpreter.setNumThreads  (int numThreads)

Prefer using Interpreter.Options#setNumThreads(int) directly for controlling thread multi-threading. This method will be removed in a future release.

 

5.2.2. NeuronDelegate.java

class NeuronDelegate : public Delegate, public AutoCloseable

Delegate for Neuron inference.

Public Functions

inline NeuronDelegate(Options options)
 
inline NeuronDelegate()
 
inline long getNativeHandle()
 
inline void close()

Frees TFLite resources in C runtime.

User is expected to call this method explicitly.

Private Members

long delegateHandle
 

Private Static Functions

static native long createDelegate (int preference, String deviceName, String cacheDir, String modelToken, int maxDelegatedPartitions, boolean allowFp16, int executionPriority, boolean enableLowLatency, boolean enableDeepFusion, boolean enableBatchProcessing, int boostValue, int boostDuration, String compileOptions, boolean useAhwb, boolean useIon)
 
static native void deleteDelegate (long delegateHandle)
 

Private Static Attributes

static final long INVALID_DELEGATE_HANDLE   = 0
 
class Options

Delegate options.

Public Functions

inline Options()
 
inline Options setExecutionPreference(int preference)

Sets the inference preference for precision/compilation/runtime tradeoffs.

Parameters:

preference – One of EXECUTION_PREFERENCE_LOW_POWER, EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER, and EXECUTION_PREFERENCE_SUSTAINED_SPEED.

inline Options setExecutionPrioriy(int executionPriority)

Sets execution priority.

Parameters:

preference – One of EXECUTION_PRIORITY_LOW, EXECUTION_PRIORITY_MEDIUM, and EXECUTION_PRIORITY_HIGH.

inline Options setEnableLowLatency(boolean enableLowLatency)

Sets enable low latency.

inline Options setEnableDeepFusion(boolean enableDeepFusion)

Sets enable deep fusion.

inline Options setEnableBatchProcessing(boolean enableBatchProcessing)

Sets enable batch processing.

inline Options setBoostValue(int boostValue)

Sets boost value.

inline Options setBoostDuration(int boostDuration)

Sets boost duration.

inline Options setCompileOptions(String compileOptions)

Set compile options for neuron delegate.

Only effective on Android 12 (API level 31) and above.

inline Options setAcceleratorName(String name)

Specifies the name of the target accelerator to be used by NNAPI. If this parameter is specified the setUseNnapiCpu(boolean) method won’t have any effect.

Only effective on Android 10 (API level 29) and above.

inline Options setCacheDir(String cacheDir)

Configure the location to be used to store model compilation cache entries. If either cacheDir or modelToken parameters are unset NNAPI caching will be disabled.

Only effective on Android 10 (API level 29) and above.

inline Options setModelToken(String modelToken)

Sets the token to be used to identify this model in the model compilation cache. If either cacheDir or modelToken parameters are unset NNAPI caching will be disabled.

Only effective on Android 10 (API level 29) and above.

inline Options setMaxNumberOfDelegatedPartitions(int limit)

Sets the maximum number of graph partitions that the delegate will try to delegate. If more partitions could be delegated than the limit, the ones with the larger number of nodes will be chosen. If unset it will use the NNAPI default limit.

inline Options setAllowFp16(boolean enable)

Enable or disable to allow fp32 computation to be run in fp16 in NNAPI. See https://source.android.com/devices/neural-networks#android-9

Only effective on Android 9 (API level 28) and above.

inline Options setUseIon(boolean useIon)

Enable or disable to use Ion

Only effective on Android 11 (API level 30)

inline Options setUseAhwb(boolean useAhwb)

Enable or disable to use Ahwb

Only effective on Android 12 (API level 31)

Public Static Attributes

static final int EXECUTION_PREFERENCE_UNDEFINED   = -1

undefined, specifies default behavior. so far, the default setting of NEURON is EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER

static final int EXECUTION_PREFERENCE_LOW_POWER   = 0

Prefer executing in a way that minimizes battery drain. This is desirable for compilations that will be executed often.

static final int EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER   = 1

Prefer returning a single answer as fast as possible, even if this causes more power consumption.

static final int EXECUTION_PREFERENCE_SUSTAINED_SPEED   = 2

Prefer maximizing the throughput of successive frames, for example when processing successive frames coming from the camera.

static final int EXECUTION_PRIORITY_LOW   = 90
 
static final int EXECUTION_PRIORITY_MEDIUM   = 100
 
static final int EXECUTION_PRIORITY_HIGH   = 110
 

Private Members

int executionPreference = EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER
 
String acceleratorName = null
 
String cacheDir = null
 
String modelToken = null
 
Integer maxDelegatedPartitions = null
 
Boolean allowFp16 = null
 
int executionPriority = EXECUTION_PRIORITY_MEDIUM
 
boolean enableLowLatency
 
boolean enableDeepFusion
 
boolean enableBatchProcessing
 
int boostValue = -1
 
int boostDuration
 
String compileOptions = null
 
boolean useAhwb = true
 
boolean useIon
 
namespace com
 
namespace mediatek
 
namespace neuropilot_S
 
namespace neuron
 
file NeuronDelegate.java
 

5.2.3. NnApiDelegate.java

class NnApiDelegate : public Delegate, public AutoCloseable

Delegate for NNAPI inference.

Public Functions

inline NnApiDelegate(Options options)
 
inline NnApiDelegate()
 
inline long getNativeHandle()
 
inline void close()

Frees TFLite resources in C runtime.

User is expected to call this method explicitly.

inline int getNnapiErrno()

Returns the latest error code returned by an NNAPI call or zero if NO calls to NNAPI failed. The error code is reset when the delegate is associated with an interpreter.

For details on NNAPI error codes see the NNAPI documentation.

Throws:

IllegalStateException – if the method is called after close.

inline boolean hasErrors()

Returns true if any NNAPI call failed since this delegate was associated with an interpreter.

Throws:

IllegalStateException – if the method is called after close.

Private Functions

inline void checkNotClosed()
 

Private Members

long delegateHandle
 

Private Static Functions

static native long createDelegate (int preference, String deviceName, String cacheDir, String modelToken, int maxDelegatedPartitions, boolean overrideDisallowCpu, boolean disallowCpuValue, boolean allowFp16, int executionPriority, long maxCompilationTimeoutDurationNs, long maxExecutionTimeoutDurationNs, long maxExecutionLoopTimeoutDurationNs)
 
static native void deleteDelegate (long delegateHandle)
 
static native int getNnapiErrno (long delegateHandle)
 

Private Static Attributes

static final long INVALID_DELEGATE_HANDLE   = 0
 
class Options

Delegate options.

Public Functions

inline Options()
 
inline Options setExecutionPreference(int preference)

Sets the inference preference for precision/compilation/runtime tradeoffs.

Parameters:

preference – One of EXECUTION_PREFERENCE_LOW_POWER, EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER, and EXECUTION_PREFERENCE_SUSTAINED_SPEED.

inline Options setExecutionPrioriy(int executionPriority)

Sets execution priority.

Parameters:

preference – One of EXECUTION_PRIORITY_LOW, EXECUTION_PRIORITY_MEDIUM, and EXECUTION_PRIORITY_HIGH.

inline Options setMaxCompilationTimeoutDurationNs(long maxCompilationTimeoutDurationNs)

Sets max compilation timeout duration in ns.

inline Options setMaxExecutionTimeoutDurationNs(long maxExecutionTimeoutDurationNs)

Sets max execution timeout duration in ns.

inline Options setMaxExecutionLoopTimeoutDurationNs(long maxExecutionLoopTimeoutDurationNs)

Sets max execution loop timeout duration in ns.

inline Options setAcceleratorName(String name)

Specifies the name of the target accelerator to be used by NNAPI. If this parameter is specified the setUseNnapiCpu(boolean) method won’t have any effect.

Only effective on Android 10 (API level 29) and above.

inline Options setCacheDir(String cacheDir)

Configure the location to be used to store model compilation cache entries. If either cacheDir or modelToken parameters are unset NNAPI caching will be disabled.

Only effective on Android 10 (API level 29) and above.

inline Options setModelToken(String modelToken)

Sets the token to be used to identify this model in the model compilation cache. If either cacheDir or modelToken parameters are unset NNAPI caching will be disabled.

Only effective on Android 10 (API level 29) and above.

inline Options setMaxNumberOfDelegatedPartitions(int limit)

Sets the maximum number of graph partitions that the delegate will try to delegate. If more partitions could be delegated than the limit, the ones with the larger number of nodes will be chosen. If unset it will use the NNAPI default limit.

inline Options setUseNnapiCpu(boolean enable)

Enable or disable the NNAPI CPU Device “nnapi-reference”. If unset it will use the NNAPI default settings.

Only effective on Android 10 (API level 29) and above.

inline Options setAllowFp16(boolean enable)

Enable or disable to allow fp32 computation to be run in fp16 in NNAPI. See https://source.android.com/devices/neural-networks#android-9

Only effective on Android 9 (API level 28) and above.

Public Static Attributes

static final int EXECUTION_PREFERENCE_UNDEFINED   = -1

undefined, specifies default behavior. so far, the default setting of NNAPI is EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER

static final int EXECUTION_PREFERENCE_LOW_POWER   = 0

Prefer executing in a way that minimizes battery drain. This is desirable for compilations that will be executed often.

static final int EXECUTION_PREFERENCE_FAST_SINGLE_ANSWER   = 1

Prefer returning a single answer as fast as possible, even if this causes more power consumption.

static final int EXECUTION_PREFERENCE_SUSTAINED_SPEED   = 2

Prefer maximizing the throughput of successive frames, for example when processing successive frames coming from the camera.

static final int EXECUTION_PRIORITY_LOW   = 90
 
static final int EXECUTION_PRIORITY_MEDIUM   = 100
 
static final int EXECUTION_PRIORITY_HIGH   = 110
 

Private Members

int executionPreference = EXECUTION_PREFERENCE_UNDEFINED
 
String acceleratorName = null
 
String cacheDir = null
 
String modelToken = null
 
Integer maxDelegatedPartitions = null
 
Boolean useNnapiCpu = null
 
Boolean allowFp16 = null
 
int executionPriority = EXECUTION_PRIORITY_MEDIUM
 
long maxCompilationTimeoutDurationNs
 
long maxExecutionTimeoutDurationNs
 
long maxExecutionLoopTimeoutDurationNs
 
namespace com
 
namespace mediatek
 
namespace neuropilot_S
 
namespace nnapi
 
file NnApiDelegate.java
 
 

5.3. Neuron Adapter API Reference

Typedefs

typedef struct NeuronModel NeuronModel

NeuronModel is an opaque type that contains a description of the mathematical operations that constitute the model.

typedef struct NeuronCompilation NeuronCompilation

NeuronCompilation is an opaque type that can be used to compile a machine learning model.

typedef struct NeuronExecution NeuronExecution

NeuronExecution is an opaque type that can be used to apply a machine learning model to a set of inputs.

typedef struct NeuronDevice NeuronDevice

NeuronDevice is an opaque type that represents a device.

This type is used to query basic properties and supported operations of the corresponding device, and control which device(s) a model is to be run on.

Available since 4.1.0

typedef struct NeuronMemory NeuronMemory

This type is used to represent shared memory, memory mapped files, and similar memories.

It is the application’s responsibility to ensure that there are no uses of the memory after calling NeuronMemory_free. This includes the execution which references this memory because of a call to NeuronExecution_setInputFromMemory or NeuronExecution_setOutputFromMemory.

Available since 4.1.0

typedef struct NeuronEvent NeuronEvent

NeuronEvent is an opaque type that represents an event that will be signaled once an execution completes.

Available since 5.0.0

typedef struct NeuronOperandType NeuronOperandType

NeuronOperandType describes the type of an operand. This structure is used to describe both scalars and tensors.

typedef struct NeuronSymmPerChannelQuantParams NeuronSymmPerChannelQuantParams

Parameters for NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL operand.

Enums

enum NeuronAdapterResultCode

Result codes.

Values:

enumerator NEURON_NO_ERROR
 
enumerator NEURON_OUT_OF_MEMORY
 
enumerator NEURON_INCOMPLETE
 
enumerator NEURON_UNEXPECTED_NULL
 
enumerator NEURON_BAD_DATA
 
enumerator NEURON_OP_FAILED
 
enumerator NEURON_UNMAPPABLE
 
enumerator NEURON_BAD_STATE
 
enumerator NEURON_BAD_VERSION
 
enumerator NEURON_OUTPUT_INSUFFICIENT_SIZE
 
enumerator NEURON_UNAVAILABLE_DEVICE
 
enumerator NEURON_MISSED_DEADLINE_TRANSIENT
 
enumerator NEURON_MISSED_DEADLINE_PERSISTENT
 
enumerator NEURON_RESOURCE_EXHAUSTED_TRANSIENT
 
enumerator NEURON_RESOURCE_EXHAUSTED_PERSISTENT
 
enumerator NEURON_DEAD_OBJECT
 
enum [anonymous]

Operand values with size in bytes that are smaller or equal to this will be immediately copied into the model.

Values:

enumerator NEURON_MAX_SIZE_OF_IMMEDIATELY_COPIED_VALUES
 
enum [anonymous]

Size of the cache token, in bytes, required from the application.

Values:

enumerator NEURON_BYTE_SIZE_OF_CACHE_TOKEN
 
enum [anonymous]

Operand types. The type of operands that can be added to a model.

Some notes on quantized tensors

NEURON_TENSOR_QUANT8_ASYMM

Attached to this tensor are two numbers that can be used to convert the 8 bit integer to the real value and vice versa. These two numbers are:

  • scale: a 32 bit floating point value greater than zero.

  • zeroPoint: a 32 bit integer, in range [0, 255].

 

The formula is: real_value = (integer_value - zero_value) * scale.

NEURON_TENSOR_QUANT16_SYMM

Attached to this tensor is a number representing real value scale that is used to convert the 16 bit number to a real value in the following way: realValue = integerValue * scale. scale is a 32 bit floating point with value greater than zero.

NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL

This tensor is associated with additional fields that can be used to convert the 8 bit signed integer to the real value and vice versa. These fields are:

  • channelDim: a 32 bit unsigned integer indicating channel dimension.

  • scales: an array of positive 32 bit floating point values.

 

The size of the scales array must be equal to dimensions[channelDim]. NeuronModel_setOperandSymmPerChannelQuantParams must be used to set the parameters for an Operand of this type. The channel dimension of this tensor must not be unknown (dimensions[channelDim] != 0). The formula is: realValue[…, C, …] = integerValue[…, C, …] * scales[C] where C is an index in the Channel dimension.

NEURON_TENSOR_QUANT16_ASYMM

Attached to this tensor are two numbers that can be used to convert the 16 bit integer to the real value and vice versa. These two numbers are:

  • scale: a 32 bit floating point value greater than zero.

  • zeroPoint: a 32 bit integer, in range [0, 65535].

 

The formula is: real_value = (integer_value - zeroPoint) * scale.

NEURON_TENSOR_QUANT8_SYMM

Attached to this tensor is a number representing real value scale that is used to convert the 8 bit number to a real value in the following way: realValue = integerValue * scale. scale is a 32 bit floating point with value greater than zero.

NEURON_TENSOR_QUANT8_ASYMM_SIGNED

Attached to this tensor are two numbers that can be used to convert the 8 bit integer to the real value and vice versa. These two numbers are:

  • scale: a 32 bit floating point value greater than zero.

  • zeroPoint: a 32 bit integer, in range [-128, 127].

 

The formula is: real_value = (integer_value - zeroPoint) * scale.

Values:

enumerator NEURON_FLOAT32

A 32 bit floating point scalar value.

enumerator NEURON_INT32

A signed 32 bit integer scalar value.

enumerator NEURON_UINT32

An unsigned 32 bit integer scalar value.

enumerator NEURON_TENSOR_FLOAT32

A tensor of 32 bit floating point values.

enumerator NEURON_TENSOR_INT32

A tensor of 32 bit integer values.

enumerator NEURON_TENSOR_QUANT8_ASYMM

A tensor of 8 bit integers that represent real numbers.

enumerator NEURON_BOOL

An 8 bit boolean scalar value.

enumerator NEURON_TENSOR_QUANT16_SYMM

A tensor of 16 bit signed integers that represent real numbers.

enumerator NEURON_TENSOR_FLOAT16

A tensor of IEEE 754 16 bit floating point values.

enumerator NEURON_TENSOR_BOOL8

A tensor of 8 bit boolean values.

enumerator NEURON_FLOAT16

An IEEE 754 16 bit floating point scalar value.

enumerator NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL

A tensor of 8 bit signed integers that represent real numbers.

enumerator NEURON_TENSOR_QUANT16_ASYMM

A tensor of 16 bit unsigned integers that represent real numbers.

enumerator NEURON_TENSOR_QUANT8_SYMM

A tensor of 8 bit signed integers that represent real numbers.

enumerator NEURON_TENSOR_QUANT8_ASYMM_SIGNED

A tensor of 8 bit signed integers that represent real numbers.

enumerator NEURON_MODEL

A reference to a model.

enumerator NEURON_EXT_TENSOR_UINT32

Extended data type - tensor uint32

enum NeuronOperationType

Operation Types

Supported operations are listed with available versions. See Neuron_getVersion for querying version number.

Attempting to compile models with operations marked as not available will get a compilation failure.

Refer to the operation support status of each hardware platform. Attempting to compile models with operations supported by this library but not supported by the underlying hardware platform will get a compilation failure too.

Compatible NNAPI levels are also listed.

Values:

enumerator NEURON_ADD

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_AVERAGE_POOL_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_CONCATENATION

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_CONV_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_DEPTHWISE_CONV_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_DEPTH_TO_SPACE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_DEQUANTIZE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_EMBEDDING_LOOKUP

Not available.

enumerator NEURON_FLOOR

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_FULLY_CONNECTED

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_HASHTABLE_LOOKUP

Not available.

enumerator NEURON_L2_NORMALIZATION

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_L2_POOL_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LOCAL_RESPONSE_NORMALIZATION

Not available.

enumerator NEURON_LOGISTIC

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LSH_PROJECTION

Not available.

enumerator NEURON_LSTM

Not available.

enumerator NEURON_MAX_POOL_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_MUL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RELU

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RELU1

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RELU6

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RESHAPE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RESIZE_BILINEAR

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RNN

Not available.

enumerator NEURON_SOFTMAX

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SPACE_TO_DEPTH

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SVDF

Not available.

enumerator NEURON_TANH

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_BATCH_TO_SPACE_ND

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_DIV

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_MEAN

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_PAD

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SPACE_TO_BATCH_ND

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SQUEEZE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_STRIDED_SLICE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SUB

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_TRANSPOSE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ABS

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ARGMAX

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ARGMIN

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_AXIS_ALIGNED_BBOX_TRANSFORM

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_BIDIRECTIONAL_SEQUENCE_LSTM

Not available.

enumerator NEURON_BIDIRECTIONAL_SEQUENCE_RNN

Not available.

enumerator NEURON_BOX_WITH_NMS_LIMIT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_CAST

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_CHANNEL_SHUFFLE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_DETECTION_POSTPROCESSING

Not available.

enumerator NEURON_EQUAL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_EXP

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_EXPAND_DIMS

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_GATHER

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_GENERATE_PROPOSALS

Not available.

enumerator NEURON_GREATER

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_GREATER_EQUAL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_GROUPED_CONV_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_HEATMAP_MAX_KEYPOINT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_INSTANCE_NORMALIZATION

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LESS

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LESS_EQUAL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LOG

Not available.

enumerator NEURON_LOGICAL_AND

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LOGICAL_NOT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LOGICAL_OR

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_LOG_SOFTMAX

Not available.

enumerator NEURON_MAXIMUM

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_MINIMUM

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_NEG

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_NOT_EQUAL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_PAD_V2

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_POW

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_PRELU

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_QUANTIZE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_QUANTIZED_16BIT_LSTM

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RANDOM_MULTINOMIAL

Not available.

enumerator NEURON_REDUCE_ALL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_REDUCE_ANY

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_REDUCE_MAX

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_REDUCE_MIN

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_REDUCE_PROD

Not available.

enumerator NEURON_REDUCE_SUM

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ROI_ALIGN

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ROI_POOLING

Not available.

enumerator NEURON_RSQRT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SELECT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SIN

Not available.

enumerator NEURON_SLICE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SPLIT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_SQRT

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_TILE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_TOPK_V2

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_TRANSPOSE_CONV_2D

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_UNIDIRECTIONAL_SEQUENCE_LSTM

Not available.

enumerator NEURON_UNIDIRECTIONAL_SEQUENCE_RNN

Not available.

enumerator NEURON_RESIZE_NEAREST_NEIGHBOR

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_QUANTIZED_LSTM

Not available.

enumerator NEURON_IF

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_WHILE

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_ELU

Not available.

enumerator NEURON_HARD_SWISH

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_FILL

Available since 4.1.0. NNAPI level 30.

enumerator NEURON_RANK

Not available.

enumerator NEURON_BATCH_MATMUL

Available since 5.1.2. NNAPI FL6.

enumerator NEURON_NUMBER_OF_OPERATIONS
 
enum NeuronAdapterFuseCode

Fused activation function types.

Values:

enumerator NEURON_FUSED_NONE
 
enumerator NEURON_FUSED_RELU
 
enumerator NEURON_FUSED_RELU1
 
enumerator NEURON_FUSED_RELU6
 
enum NeuronAdapterPaddingCode

Implicit padding algorithms.

Values:

enumerator NEURON_PADDING_SAME

SAME padding. Padding on both ends are the “same”: padding_to_beginning = total_padding / 2 padding_to_end = (total_padding + 1)/2. i.e., for even number of padding, padding to both ends are exactly the same; for odd number of padding, padding to the ending is bigger than the padding to the beginning by 1.

total_padding is a function of input, stride and filter size. It could be computed as follows: out_size = (input + stride - 1) / stride; needed_input = (out_size - 1) * stride + filter_size total_padding = max(0, needed_input - input_size) The computation is the same for the horizontal and vertical directions.

enumerator NEURON_PADDING_VALID

VALID padding. No padding. When the input size is not evenly divisible by the filter size, the input at the end that could not fill the whole filter tile will simply be ignored.

enum NeuronAdapterPreferenceCode

Execution preferences.

Values:

enumerator NEURON_PREFER_LOW_POWER
 
enumerator NEURON_PREFER_FAST_SINGLE_ANSWER
 
enumerator NEURON_PREFER_SUSTAINED_SPEED
 
enumerator NEURON_PREFER_TURBO_BOOST
 
enum NeuronAdapterPriorityCode

Relative execution priority.

Values:

enumerator NEURON_PRIORITY_LOW
 
enumerator NEURON_PRIORITY_MEDIUM
 
enumerator NEURON_PRIORITY_HIGH
 
enumerator NEURON_PRIORITY_DEFAULT
 
enum OptimizationCode

Compiler optimization hint.

Values:

enumerator NEURON_OPTIMIZATION_NORMAL

Normal optimization. Available since 4.3.1

enumerator NEURON_OPTIMIZATION_LOW_LATENCY

Reduce latency by utilizing as many APU cores as possible. Available since 4.3.1

enumerator NEURON_OPTIMIZATION_DEEP_FUSION

Reducing DRAM access as more as possible. Available since 4.4.0

enumerator NEURON_OPTIMIZATION_BATCH_PROCESSING

Reduce latency by using as many APU cores as possible in batch-dimension. (For models with batch > 1) Available since 4.4.0

enumerator NEURON_OPTIMIZATION_DEFAULT

Default optimization setting. Available since 4.3.1

enum CacheFlushCode

CPU cache flush hint.

Values:

enumerator NEURON_CACHE_FLUSH_ENABLE_ALL

Sync input buffer and invalidate output buffer. Available since 5.0.1

enumerator NEURON_CACHE_FLUSH_DISABLE_SYNC_INPUT

Disable sync input buffer. Available since 5.0.1

enumerator NEURON_CACHE_FLUSH_DISABLE_INVALIDATE_OUTPUT

Disable invalidate output buffer. Available since 5.0.1

enumerator NEURON_CACHE_FLUSH_DEFAULT

Default cache flush setting. Available since 5.0.1

Functions

int Neuron_getVersion(NeuronRuntimeVersion *version)

Get the version of Neuron runtime library.

Parameters:

version – the version of Neuron runtime library.

Returns:

NEURON_NO_ERROR

int Neuron_getL1MemorySizeKb(uint32_t *sizeKb)

Get the size of L1 memory in APU.

Available since 4.3.0

Parameters:

sizeKb – L1 memory size in KB

Returns:

NEURON_NO_ERROR if successful.

int NeuronMemory_createFromFd(size_t size, int protect, int fd, size_t offset, NeuronMemory **memory)

Creates a shared memory object from a file descriptor.

For ion descriptor, application should create the ion memory and descriptor first and then use it in this function.

Available since 4.1.0 Only supports ion fd.

Parameters:

size – The requested size in bytes. Must not be larger than the file size. The desired memory protection for the mapping. It is either PROT_NONE or the bitwise OR of one or more of the following flags: PROT_READ, PROT_WRITE. The requested file descriptor. The file descriptor has to be mmap-able. The offset to the beginning of the file of the area to map. The memory object to be created. Set to NULL if unsuccessful.

int NeuronMemory_createFromAHardwareBuffer()

Not supported at non-android platform

Returns:

NEURON_BAD_STATE

void NeuronMemory_free(NeuronMemory *memory)

Delete a memory object.

For ion memory, this function cleans up the internal resource associated with this memory. Applications should clean up the allocated ion memory after this function.

Available since 4.1.0

int NeuronModel_create(NeuronModel **model)

Create an empty NeuronModel. The model should be constructed with calls to NeuronModel_addOperation and NeuronModel_addOperand.

Available since 4.1.0

Parameters:

model – The NeuronModel to be created. Set to NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful.

void NeuronModel_free(NeuronModel *model)

Destroy a model. The model need not have been finished by a call to NeuronModel_finish.

Available since 4.1.0

Parameters:

model – The model to be destroyed.

int NeuronModel_finish(NeuronModel *model)

Indicate that we have finished modifying a model. Required before calling NeuronCompilation_compile.

Available since 4.1.0

Parameters:

model – The model to be finished.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_addOperand(NeuronModel *model, const NeuronOperandType *type)

Add an operand to a model. The order in which the operands are added is important. The first one added to a model will have the index value 0, the second 1, etc. These indexes are used as operand identifiers in NeuronModel_addOperation.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • type – The NeuronOperandType that describes the shape of the operand. Neither the NeuronOperandType nor the dimensions it points to need to outlive the call to NeuronModel_addOperand.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_setOperandValue(NeuronModel *model, int32_t index, const void *buffer, size_t length)

Sets an operand to a constant value. Values of length smaller or equal to NEURON_MAX_SIZE_OF_IMMEDIATELY_COPIED_VALUES are immediately copied into the model. For values of length greater than NEURON_MAX_SIZE_OF_IMMEDIATELY_COPIED_VALUES, a pointer to the buffer is stored within the model. The application must not change the content of this region until all executions using this model have completed. As the data may be copied during processing, modifying the data after this call yields undefined results.

Attempting to modify a model once NeuronModel_finish has been called will return an error.

A special notice on the buffer lifetime when the length is greater than NEURON_MAX_SIZE_OF_IMMEDIATELY_COPIED_VALUES. The provided buffer must outlive the compilation of this model. I.e. user must keep the buffer unchanged until NeuronCompilation_finish of this model. This is an internal optimization comparing to NNAPI. In NNAPI, NN runtime will copy the buffer to a shared memory between NN runtime and NNAPI HIDL service during ANNModel_finish, and it will be copied again to the compiled result during ANNCompilation_finish. In Neuron Adapter, there will be only one copying during NeuronCompilaiton_finish, so it is required to keep the buffer alive until NeuronCompilaiton_finish returned.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • index – The index of the model operand we’re setting.

  • buffer – A pointer to the data to use.

  • length – The size in bytes of the data value.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_setOperandValueFromModel(NeuronModel *model, int32_t index, const NeuronModel *value)

Sets an operand to a value that is a reference to another NeuronModel.

The referenced model must already have been finished by a call to NeuronModel_finish.

The NeuronModel_relaxComputationFloat32toFloat16 setting of referenced models is overridden by that setting of the main model of a compilation.

The referenced model must outlive the model referring to it.

Attempting to modify a model once NeuronModel_finish has been called will return an error.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • index – The index of the model operand we’re setting.

  • value – The model to be referenced.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_setOperandSymmPerChannelQuantParams(NeuronModel *model, int32_t index, const NeuronSymmPerChannelQuantParams *channelQuant)

Sets an operand’s per channel quantization parameters Sets parameters required by a tensor of type NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL This function must be called for every tensor of type NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL before calling NeuronModel_finish

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • index – The index of the model operand we’re setting.

  • channelQuant – The per channel quantization parameters for the operand. No memory in this struct needs to outlive the call to this function.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_addOperation(NeuronModel *model, NeuronOperationType type, uint32_t inputCount, const uint32_t *inputs, uint32_t outputCount, const uint32_t *outputs)

Add an operation to a model. The operands specified by inputs and outputs must have been previously added by calls to NeuronModel_addOperand.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • type – The NeuronOperationType of the operation.

  • inputCount – The number of entries in the inputs array.

  • inputs – An array of indexes identifying each operand.

  • outputCount – The number of entries in the outputs array.

  • outputs – An array of indexes identifying each operand.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_addOperationExtension(NeuronModel *model, const char *name, const char *vendor, const NeuronDevice *device, uint32_t inputCount, const uint32_t *inputs, uint32_t outputCount, const uint32_t *outputs)

Add an operation extension to a model. The operands specified by inputs and outputs must have been previously added by calls to NeuronModel_addOperand. User needs to specify the operation extension name and the desired device which will execute the operation extension.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • name – The name of the operation extension.

  • vendor – The name of the vendor which will implement the operation extension.

  • device – The device which will execute the operation extension.

  • inputCount – The number of entries in the inputs array.

  • inputs – An array of indexes identifying each operand.

  • outputCount – The number of entries in the outputs array.

  • outputs – An array of indexes identifying each operand.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_identifyInputsAndOutputs(NeuronModel *model, uint32_t inputCount, const uint32_t *inputs, uint32_t outputCount, const uint32_t *outputs)

Specfifies which operands will be the model’s inputs and outputs. An operand cannot be used for both input and output. Doing so will return an error.

The operands specified by inputs and outputs must have been previously added by calls to NeuronModel_addOperand.

Attempting to modify a model once NeuronModel_finish has been called will return an error.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • inputCount – The number of entries in the inputs array.

  • inputs – An array of indexes identifying the input operands.

  • outputCount – The number of entries in the outputs array.

  • outputs – An array of indexes identifying the output operands.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_getSupportedOperations(NeuronModel *model, bool *supported, uint32_t operationCount)

Gets the supported operations in a model. This function must be called after calling NeuronModel_finish

Available since 4.1.0

Parameters:
  • model – The model to be queried.

  • supported – The boolean array to be filled. True means supported. The size of the boolean array must be at least as large as the number of operations in the model. The order of elements in the supported array matches the order in which the corresponding operations were added to the model.

  • operationCount – number of operations in the model

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_getSupportedOperationsForDevices(const NeuronModel *model, const NeuronDevice *const *devices, uint32_t numDevices, bool *supportedOps)

Get the supported operations for a specified set of devices. If multiple devices are selected, the supported operation list is a union of supported operations of all selected devices.

Available since 4.1.0

Parameters:
  • model – The model to be queried.

  • devices – Selected devices

  • numDevices – Number of selected devices

  • supportedOps – The boolean array to be filled. True means supported. The size of the boolean array must be as least as large as the number of operations in the model. The order of elements in the supportedOps array matches the order in which the corresponding operations were added to the model.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_relaxComputationFloat32toFloat16(NeuronModel *model, bool allow)

Specifies whether NEURON_TENSOR_FLOAT32 is allowed to be calculated with range and/or precision as low as that of the IEEE 754 16-bit floating-point format. By default, NEURON_TENSOR_FLOAT32 must be calculated using at least the range and precision of the IEEE 754 32-bit floating-point format.

Available since 4.1.0

Parameters:
  • model – The model to be modified.

  • allow – ‘true’ indicates NEURON_TENSOR_FLOAT32 may be calculated with range and/or precision as low as that of the IEEE 754 16-bit floating point format. ‘false’ indicates NEURON_TENSOR_FLOAT32 must be calculated using at least the range and precision of the IEEE 754 32-bit floating point format.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_suppressInputConversion(NeuronModel *model, bool suppress)

Hint compiler to suppress the input data conversion, the users have to convert the input data into platform-expected format before inference.

Available since 4.2.0

Parameters:
  • model – The model to be modified.

  • suppress – True to suppress the input data conversion.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_suppressOutputConversion(NeuronModel *model, bool suppress)

Hint compiler to suppress the output data conversion, the users have to convert the output data from platform-generated format before inference.

Available since 4.2.0

Parameters:
  • model – The model to be modified.

  • suppress – True to suppress the output data conversion.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_restoreFromCompiledNetwork(NeuronModel **model, NeuronCompilation **compilation, const void *buffer, const size_t size)

Restore the compiled network using user provided buffer.

The restored NeuronCompilaton could be used in creating executing instance. The restored NeuronModel cannot be recompiled.

Available since 4.3.0

Parameters:
  • model – Restored model.

  • compilation – Restored compilation

  • buffer – User provided buffer to restore the compiled network.

  • size – Size of the user provided buffer in bytes.

Returns:

NEURON_NO_ERROR if compiled network is successfully copied to the user allocated buffer. NEURON_BAD_DATA if it fails to load the compiled network, this could either be the version is not matched or the data is corrupted.

int NeuronCompilation_create(NeuronModel *model,  NeuronCompilation **compilation)

Create a NeuronCompilation to compile the given model.

This function only creates the object. Compilation is only performed once NeuronCompilation_finish is invoked. NeuronCompilation_finish should be called once all desired properties have been set on the compilation. NeuronModel_free should be called once the compilation is no longer needed. The provided model must outlive the compilation. The model must already have been finished by a call to NeuronModel_finish.

Available since 4.1.0

Parameters:
  • model – The NeuronModel to be compiled.

  • compilation – The newly created object or NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful

void NeuronCompilation_free(NeuronCompilation *compilation)

Destroy a compilation.

Available since 4.1.0

Parameters:

compilation – The compilation to be destroyed.

int NeuronCompilation_finish(NeuronCompilation *compilation)

Compilation is finished once NeuronCompilation_finish is invoked. Required before calling NeuronExecution_create. This function must only be called once for a given compilation.

Available since 4.1.0

Parameters:

compilation – The compilation to be finished.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setCaching(NeuronCompilation *compilation, const char *cacheDir, const uint8_t *token)

Provides optional caching information for faster re-compilation.

Available since 4.1.0

Parameters:
  • compilation – The compilation to be cached.

  • cacheDir – The cache directory for storing and retrieving caching data. The user should choose a directory local to the application, and is responsible for managing the cache entries.

  • token – The token provided by the user to specify a model must be of length NEURON_BYTE_SIZE_OF_CACHE_TOKEN. The user should ensure that the token is unique to a model within the application. Neuron cannot detect token collisions; a collision will result in a failed execution or in a successful execution that produces incorrect output values.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setL1MemorySizeKb(NeuronCompilation *compilation, uint32_t sizeKb)

Hint compiler with the size of L1 memory, this value should not be larger than real platform’s settings. The user can get the platform’s L1 memory size in KB by calling Neuron_getL1MemorySizeKb.

Available since 4.3.0

Parameters:
  • compilation – The compilation to be modified.

  • sizeKb – L1 memory size in KB.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_createForDevices(NeuronModel *model, const NeuronDevice *const *devices, uint32_t numDevices, NeuronCompilation **compilation)

Create a NeuronCompilation to compile the given model for a specified set of devices. The user must handle all compilation and execution failures from the specified set of devices. This is in contrast to a use of NeuronCompilation_create, where neuron will attempt to recover from such failures.

Available since 4.1.0

Parameters:
  • model – The NeuronModel to be compiled.

  • devices – The set of devices. Must not contain duplicates.

  • numDevices – The number of devices in the set.

  • compilation – The newly created object or NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the model is invalid.

int NeuronCompilation_createForDebug(NeuronModel *model, NeuronCompilation **compilation)

Create a NeuronCompilation. Which can divide one graph into several subgraph and use the information to debug.

Only be used in debug purpose, no guarantees performance and thread safe.

Available since 5.0.0

Parameters:
  • model – The NeuronModel to be compiled.

  • compilation – The newly created object or NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the model is invalid.

int NeuronCompilation_setPreference(NeuronCompilation *compilation, int32_t preference)

Sets the execution preference associated with this compilation.

Default value of preference is PREFER_SINGLE_FAST_ANSWER

Available since 4.1.0

Parameters:
  • compilation – The compilation to be modified.

  • preference – Either NEURON_PREFER_LOW_POWER, NEURON_PREFER_SINGLE_FAST_ANSWER, or NEURON_PREFER_SUSTAINED_SPEED.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setPriority(NeuronCompilation *compilation, int priority)

Sets the execution priority associated with this compilation.

Execution priorities are relative to other executions created by the same application (specifically same uid) for the same device. Specifically, priorities of executions from one application will not affect executions from another application.

Higher priority executions may use more compute resources than lower priority executions, and may preempt or starve lower priority executions.

Available since 4.1.0

Parameters:
  • compilation – The compilation to be modified.

  • priority – The relative priority of the execution compared to other executions created by the application. Must be one of NEURON_PRIORITY_*.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_getInputPaddedDimensions(NeuronCompilation *compilation, int32_t index, uint32_t *dimensions)

Get the padded dimensional information of the specified input operand of the compilation. This function must be called after calling NeuronCompilation_finish. If NeuronModel_suppressInputConversion was not applied to the model to be compiled, the returned dimensions are the padded dimension after NeuronCompilation_finish to satisfy the optimization requirement from the underlying hardware accelerators. If NeuronModel_suppressInputConversion was applied to the model to be compiled, the returned dimensions are the same as the original dimensions given from user.

Available since 4.2.0

Parameters:
  • compilation – The compilation to be queried.

  • index – The index of the input operand we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • dimensions – The dimension array to be filled. The size of the array must be exactly as large as the rank of the input operand to be queried in the model.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_getOutputPaddedDimensions(NeuronCompilation *compilation, int32_t index, uint32_t *dimensions)

Get the padded dimensional information of the specified output operand of the compilation. This function must be called after calling NeuronCompilation_finish. If NeuronModel_suppressOutputConversion was not applied to the model to be compiled, the returned dimensions are the padded dimension after NeuronCompilation_finish to satisfy the optimization requirement from the underlying hardware accelerators. If NeuronModel_suppressOutputConversion was applied to the model to be compiled, the returned dimensions are the same as the original dimensions given from user.

Available since 4.2.0

Parameters:
  • compilation – The compilation to be queried.

  • index – The index of the output operand we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • dimensions – The dimension array to be filled. The size of the array must be exactly as large as the rank of the output operand to be queried in the model.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_getInputPaddedSize(NeuronCompilation *compilation, int32_t index, size_t *size)

Get the expected buffer size (bytes) of the specified input operand of the compilation. If NeuronModel_suppressInputConversion was not applied to the model to be compiled, the returned size are the padded size after NeuronCompilation_finish to satisfy the optimization requirement from the underlying hardware accelerators. If NeuronModel_suppressInputConversion was applied to the model to be compiled, the returned size are the same as the original size given from user.

Available since 4.2.0

Parameters:
  • compilation – The compilation to be queried.

  • index – The index of the input operand we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • size – the expected buffer size in bytes.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_getOutputPaddedSize(NeuronCompilation *compilation, int32_t index, size_t *size)

Get the expected buffer size (bytes) of the specified output operand of the compilation. If NeuronModel_suppressOutputConversion was not applied to the model to be compiled, the returned size are the padded size after NeuronCompilation_finish to satisfy the optimization requirement from the underlying hardware accelerators. If NeuronModel_suppressOutputConversion was applied to the model to be compiled, the returned size are the same as the original size given from user.

Available since 4.2.0

Parameters:
  • compilation – The compilation to be queried.

  • index – The index of the output operand we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • size – the expected buffer size in bytes.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_getCompiledNetworkSize(NeuronCompilation *compilation, size_t *size)

Get the compiled network size of the compilation.

This must be called after NeuronCompilation_finished and before NeuronExecution_create. It is not allowed to call this with a compilation restored from cache.

Available since 4.3.0

Parameters:
  • compilation – The compilation to be queried.

  • size – The compiled network size in bytes.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_storeCompiledNetwork( NeuronCompilation *compilation, void *buffer, const size_t size)

Store the compiled network.

Users have to allocate the buffer with the specified size before calling this function.

This must be called after NeuronCompilation_finished and before NeuronExecution_create. It is not allowed to call this with a compilation restored from cache.

Available since 4.3.0

Parameters:
  • compilation – The compilation to be queried.

  • buffer – User allocated buffer to store the compiled network.

  • size – Size of the user allocated buffer in bytes.

Returns:

NEURON_NO_ERROR if compiled network is successfully copied to the user allocated buffer.

int NeuronCompilation_setOptimizationHint(NeuronCompilation *compilation, uint32_t optimizationCode)

Hint the compiler to apply the optimization strategy according to the user specified parameters.

Available since 4.3.0

Parameters:
  • compilation – The compilation to be modified.

  • optimizationCode – User specified optimization strategy. Must be one of NEURON_OPTIMIZATION_* or the inclusive OR value of multiple NEURON_OPTIMIZATION_*.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setOptimizationString(NeuronCompilation *compilation, const char *optimizationString)

Hint the compiler to apply the optimization strategy according to the user specified arguments in a null-terminated string.

Available since 4.6.0

Parameters:
  • compilation – The compilation to be modified.

  • optimizationString – A null-terminated string to represent the user specified optimization strategy.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setTrimIOAlignment(NeuronCompilation *compilation, bool enable)

Hint compiler to trim the model IO alignment.

Available since 4.4.8

Parameters:
  • compilation – The compilation to be modified.

  • enable – ‘true’ for trimming model IO alignment.

Returns:

NEURON_NO_ERROR if successful.

int NeuronCompilation_setSWDilatedConv(NeuronCompilation *compilation, bool enable)

Hint compiler to use software dilated convolution

Available since 4.4.8

Parameters:
  • compilation – The compilation to be modified.

  • enable – ‘true’ indicates a hint to compiler to use software dilated convolution

Returns:

NEURON_NO_ERROR if successful.

int NeuronExecution_create(NeuronCompilation *compilation, NeuronExecution **execution)

Create a new execution instance by calling the NeuronExecution_create function. The provided compilation must outlive the execution.

Available since 4.1.0

Parameters:
  • compilation – The NeuronCompilation to be evaluated.

  • execution – The newly created object or NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful

void NeuronExecution_free(NeuronExecution *execution)

Destroy an execution.

Available since 4.1.0

Parameters:

execution – The execution to be destroyed.

int NeuronExecution_setInput(NeuronExecution *execution, int32_t index, const NeuronOperandType *type, const void *buffer, size_t length)

Associate a user buffer with an input of the model of the NeuronExecution. The provided buffer must outlive the execution.

Available since 4.1.0

Parameters:
  • execution – The execution to be modified.

  • index – The index of the input argument we are setting. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • type – The NeuronOperandType of the operand. Currently NeuronAdapter only takes NULL.

  • buffer – The buffer containing the data.

  • length – The length in bytes of the buffer.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the name is not recognized or the buffer is too small for the input.

int NeuronExecution_setOutput(NeuronExecution *execution, int32_t index, const NeuronOperandType *type, void *buffer, size_t length)

Associate a user buffer with an output of the model of the NeuronExecution. The provided buffer must outlive the execution.

Available since 4.1.0

Parameters:
  • execution – The execution to be modified.

  • index – The index of the output argument we are setting. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with NeuronModel_addOperand.

  • type – The NeuronOperandType of the operand. Currently NeuronAdapter only takes NULL.

  • buffer – The buffer where the data is to be written.

  • length – The length in bytes of the buffer.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the name is not recognized or the buffer is too small for the output.

int NeuronExecution_setInputFromMemory(NeuronExecution *execution, uint32_t index, const NeuronOperandType *type, const NeuronMemory *memory, size_t offset, size_t length)

Associate part of a memory object with an input of the model of the NeuronExecution.

The provided memory must outlive the execution and should not be changed during computation.

Available since 4.1.0

Parameters:
  • execution – The execution to be modified.

  • index – The index of the input argument we are setting. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with Neuronodel_addOperand.

  • type – The NeuronOperandType of the operand. Currently NueronAdapter only takes NULL.

  • memory – The memory containing the data.

  • offset – This specifies the location of the data within the memory. The offset is in bytes from the start of memory.

  • length – The size in bytes of the data value.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the name is not recognized or the buffer is too small for the input.

int NeuronExecution_setOutputFromMemory(NeuronExecution *execution, uint32_t index, const NeuronOperandType *type, const NeuronMemory *memory, size_t offset, size_t length)

Associate part of a memory object with an output of the model of the NeuronExecution.

The provided memory must outlive the execution and should not be changed during computation.

Available since 4.1.0

Parameters:
  • execution – The execution to be modified.

  • index – The index of the output argument we are setting. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs. It is not the index associated with Neuronodel_addOperand.

  • type – The NeuronOperandType of the operand. Currently NueronAdapter only takes NULL.

  • memory – The memory containing the data.

  • offset – This specifies the location of the data within the memory. The offset is in bytes from the start of memory.

  • length – The size in bytes of the data value.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the name is not recognized or the buffer is too small for the input.

int NeuronExecution_compute(NeuronExecution *execution)

Schedule synchronous evaluation of the execution. Returns once the execution has completed and the outputs are ready to be consumed.

Available since 4.1.0

Parameters:

execution – The execution to be scheduled and executed.

Returns:

NEURON_NO_ERROR if the execution completed normally. NEURON_BAD_STATE if the inference fails. Add two return code since 5.0.0 (NEURON_MISSED_DEADLINE_TRANSIENT if inference timeout, and NEURON_OUTPUT_INSUFFICIENT_SIZE if given outsize is not sufficient for real output)

int NeuronExecution_startComputeWithDependencies(NeuronExecution *execution, const NeuronEvent *const *dependencies, uint32_t num_dependencies, uint64_t duration, NeuronEvent **event)

Schedule asynchronous evaluation of the execution with dependencies.

The execution will wait for all the depending events to be signaled before starting the evaluation. Once the execution has completed and the outputs are ready to be consumed, the returned event will be signaled. Depending on which devices are handling the execution, the event could be backed by a sync fence. Use NeuronEvent_wait to wait for that event.

NeuronEvent_wait must be called to recurperate the resources used by the execution.

If parts of the execution are scheduled on devices that do not support fenced execution, the function call may wait for such parts to finish before returning.

The function will return an error if any of the events in dependencies is already in a bad state. After the execution is scheduled, if any of the events in dependencies does not complete normally, the execution will fail, and NeuronEvent_wait on the returned event will return an error.

The function will return an error if any of the execution outputs has a tensor operand type that is not fully specified.

Available since 5.0.0

Parameters:
  • execution – The execution to be scheduled and executed.

  • dependencies – A set of depending events. The actual evaluation will not start until all the events are signaled.

  • num_dependencies – The number of events in the dependencies set.

  • duration – currently not used

  • event – The event that will be signaled on completion. event is set to NULL if there’s an error.

Returns:

NEURON_NO_ERROR if the evaluation is successfully scheduled.

int NeuronExecution_setLoopTimeout(NeuronExecution *execution, uint64_t duration)

Set the maximum duration of WHILE loops in the specified execution.

Available since 5.0.0

Parameters:
  • execution – The execution to be modified.

  • duration – The maximum amount of time in nanoseconds.

Returns:

NEURON_NO_ERROR if successful.

uint64_t Neuron_getDefaultLoopTimeout()

Get the default timeout value for WHILE loops.

Available since 5.0.0

Returns:

The default timeout value in nanoseconds.

uint64_t Neuron_getMaximumLoopTimeout()

Get the maximum timeout value for WHILE loops.

Available since 5.0.0

Returns:

The maximum timeout value in nanoseconds.

int NeuronExecution_setBoostHint(NeuronExecution *execution, uint8_t boostValue)

Sets the execution boost hint associated with this execution. Required before calling NeuronExecution_compute.

Execution boost is the hint for the device frequency, ranged between 0 (lowest) to 100 (highest). For the compilation with preference set as NEURON_PREFER_SUSTAINED_SPEED, scheduler guarantees that the executing boost value would equal to the boost value hint.

On the other hand, for the compilation with preference set as NEURON_PREFER_LOW_POWER, scheduler would try to save power by configuring the executing boost value with some value that is not higher than the boost value hint.

Available since 4.1.0

Parameters:
  • execution – The execution to be modified.

  • boostValue – The hint for the device frequency, ranged between 0 (lowest) to 100 (highest).

Returns:

NEURON_NO_ERROR if successful.

int NeuronExecution_setCacheFlushHint(NeuronExecution *execution, uint8_t flushHint)

Sets the execution CPU cache flush hint associated with this execution. Required before calling NeuronExecution_setInputFromMemory and NeuronExecution_setOutputFromMemory.

Default value of preference is NEURON_CACHE_FLUSH_ENABLE_ALL

Available since 5.0.1

Parameters:
  • execution – The execution to be modified.

  • hint – It is either NEURON_CACHE_FLUSH_ENABLE_ALL or the bitwise OR of one or more of the following flags: NEURON_CACHE_FLUSH_DISABLE_SYNC_INPUT, NEURON_CACHE_FLUSH_DISABLE_INVALIDATE_OUTPUT.

Returns:

NEURON_NO_ERROR if successful.

int NeuronExecution_getOutputOperandRank(NeuronExecution *execution, int32_t index, uint32_t *rank)

Get the dimensional information of the specified output operand of the model of the latest computation evaluated on NeuronExecution.

This function may only be invoked when the execution is in the completed state.

Available since 5.0.0

Parameters:
  • execution – The execution to be queried.

  • index – The index of the output argument we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs.

  • rank – The rank of the output operand.

Returns:

NEURON_NO_ERROR if successful.

int NeuronExecution_getOutputOperandDimensions(NeuronExecution *execution, int32_t index, uint32_t *dimensions)

Get the dimensional information of the specified output operand of the model of the latest computation evaluated on NeuronExecution. The target output operand cannot be a scalar.

This function may only be invoked when the execution is in the completed state.

Available since 5.0.0

Parameters:
  • execution – The execution to be queried.

  • index – The index of the output argument we are querying. It is an index into the lists passed to NeuronModel_identifyInputsAndOutputs.

  • dimensions – The dimension array to be filled. The size of the array must be exactly as large as the rank of the output operand to be queried in the model.

Returns:

NEURON_NO_ERROR if successful.

int NeuronDebug_setReportPath(NeuronModel *model, const char *path)

Set report path for debug plus.

Only be used in debug purpose, the execution should be created by NeuronCompilation_createForDebug compilation.

Available since 5.0.0

Parameters:
  • model – The model need to be debug.

  • path – The path of execution report.

Returns:

NEURON_NO_ERROR if successful, NEURON_BAD_DATA if the path is empty.

int Neuron_getDeviceCount(uint32_t *numDevices)

Get the number of available devices.

Available since 4.1.0

Parameters:

numDevices – The number of devices returned.

Returns:

NEURON_NO_ERROR if successful.

int Neuron_getDevice(uint32_t devIndex, NeuronDevice **device)

Get the representation of the specified device.

Available since 4.1.0

Parameters:
  • devIndex – The index of the specified device. Must be less than the number of available devices.

  • device – The representation of the specified device. The same representation will always be returned for the specified device.

Returns:

NEURONNO_ERROR if successful.

int NeuronDevice_getName(const NeuronDevice *device, const char **name)

Get the name of the specified device.

Available since 4.1.0

Parameters:
  • device – The representation of the specified device.

  • name – The returned name of the specified device. The name will remain valid for the duration of the application.

Returns:

NEURON_NO_ERROR if successful.

int NeuronDevice_getDescription(const NeuronDevice *device, const char **description)

Get the description of the specified device.

Available since 5.0.0

Parameters:
  • device – The representation of the specified device.

  • description – The returned description of the specified device. The description will remain valid for the duration of the application.

Returns:

NEURON_NO_ERROR if successful.

void NeuronEvent_free(NeuronEvent *event)
 
int NeuronEvent_wait(NeuronEvent *event)

Waits until the execution completes.

More than one thread can wait on an event. When the execution completes, all threads will be released.

SeeNeuronExecution for information on multithreaded usage.

Available since 5.0.0

Parameters:

event – The event that will be signaled on completion.

Returns:

NEURON_NO_ERROR if the execution completed normally. NEURON_UNMAPPABLE if the execution input or output memory cannot be properly mapped.

int NeuronEvent_createFromSyncFenceFd(int sync_fence_fd, NeuronEvent **event)

Create a NeuronEventfrom a sync_fence file descriptor.

The newly created NeuronEvent does not take ownership of the provided sync_fence_fd, it will instead dup the provided sync_fence_fd and own the duplicate.

Available since 5.0.0

Parameters:
  • sync_fence_fd – The sync_fence file descriptor.

  • event – The newly created object or NULL if unsuccessful.

Returns:

NEURON_NO_ERROR if successful.

int NeuronEvent_getSyncFenceFd(const NeuronEvent *event, int *sync_fence_fd)

Get sync_fence file descriptor from the event.

If the NeuronEvent is not backed by a sync fence, the sync_fence_fd will be set to -1, and NEURON_BAD_DATA will be returned.

See NeuronEvent_createFromSyncFenceFd and NeuronExecution_startComputeWithDependencies to see how to create an event backed by a sync fence.

The user takes ownership of the returned fd, and must close the returned file descriptor when it is no longer needed.

Available since 5.0.0

Parameters:
  • event – An event that is backed by a sync fence.

  • sync_fence_fd – The sync_fence file descriptor. The file descriptor will be set to -1 if there is an error.

Returns:

NEURON_NO_ERROR if successful.

int NeuronDevice_getExtensionSupport(const char *extensionName, bool *isExtensionSupported)

Queries whether an extension is supported by the driver implementation of the specified device.

Available since 5.0.0

Parameters:
  • extension – The extension name.

  • isExtensionSupported – The boolean value indicating whether the extension is supported.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_getExtensionOperandType(NeuronModel *model, const char *extensionName, uint16_t operandCodeWithinExtension, int32_t *type)

Creates an operand type from an extension name and an extension operand code.

See NeuronModel for information on multithreaded usage.

Available since 5.0.0

Parameters:
  • model – The model to contain the operand.

  • extensionName – The extension name.

  • operandCodeWithinExtension – The extension operand code.

  • type – The operand type.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_getExtensionOperationType(NeuronModel *model, const char *extensionName, uint16_t operationCodeWithinExtension, int32_t *type)

Creates an operation type from an extension name and an extension operation code.

See NeuronModel for information on multithreaded usage.

Available since 5.0.0

Parameters:
  • model – The model to contain the operation.

  • extensionName – The extension name.

  • operationCodeWithinExtension – The extension operation code.

  • type – The operation type.

Returns:

NEURON_NO_ERROR if successful.

int NeuronModel_setOperandExtensionData(NeuronModel *model, int32_t index, const void *data, size_t length)

Sets extension operand parameters.

Available since 5.0.0

Parameters:
  • model – The model to be modified.

  • index – The index of the model operand we’re setting.

  • data – A pointer to the extension operand data. The data does not have to outlive the call to this function.

  • length – The size in bytes of the data value.

Returns:

NEURON_NO_ERROR if successful.

struct NeuronOperandType

#include <NeuronAdapter.h>

NeuronOperandType describes the type of an operand. This structure is used to describe both scalars and tensors.

Public Members

int32_t type

The data type, e.g NEURON_INT8.

uint32_t dimensionCount

The number of dimensions. It should be 0 for scalars.

const uint32_t *dimensions

The dimensions of the tensor. It should be nullptr for scalars.

float scale

These two fields are only used for quantized tensors. They should be zero for scalars and non-fixed point tensors. The dequantized value of each entry is (value - zeroPoint) * scale.

int32_t zeroPoint

Only used with scale for quantized tensors

struct NeuronSymmPerChannelQuantParams

#include <NeuronAdapter.h>

Parameters for NEURON_TENSOR_QUANT8_SYMM_PER_CHANNEL operand.

Public Members

uint32_t channelDim

The index of the channel dimension.

uint32_t scaleCount

The size of the scale array. Should be equal to dimension[channelDim] of the Operand.

const float *scales

The array of scaling values for each channel. Each value must be greater than zero.

struct NeuronRuntimeVersion

#include <NeuronAdapter.h>

The structure to represent the neuron version.

Public Members

uint8_t major

major version

uint8_t minor

minor version

uint8_t patch

patch version

 

5.4. Neuron API Reference

5.4.1. Fence.h

struct FenceInfo

#include <Fence.h>

This struct is used to receive the fence file descriptor and the post-inference callback in fenced execution. Specifically, user should allocate this struct, and pass its address into fenced execution API. The fence FD and the call back will be set properly. After fence is triggered, caller can invoke the callback to retrieve execution status and execution time.

Note

This struct is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Public Members

int64_t inputFenceFd

The file descriptor of the fence to be triggered before inference. Use -1 for this field if there is no inputFenceFd in the inference.

int64_t fenceFd

The file descriptor of the fence to be triggered at the end of inference.

void (*callback)(void *opaque)

Caller should call this callback after fence is triggered to retrieve execution status and time. Caller should send back the address of the original FenceInfo which possesses this callback in the first parameter ‘opaque’.

uint32_t status

Execution status. This will be set after callback is called.

uint32_t microseconds

Execution time. This will be set after callback is called.

uint64_t __internal__[4]

The following data are for internal use. Don’t access them.

file Fence.h

#include <stdint.h>

#include <sys/cdefs.h>

Functions

int NeuronRuntime_isFenceSupported(void *runtime, uint8_t *supported)

Check if the model supports fenced execution. Call this function after runtime is loaded with model.

Note

This function is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • supported – Non-zero value indicates that the model supports fenced execution.

Returns:

An error code indicates whether the test model executes successfully.

int NeuronRuntime_inferenceFenced(void *runtime, FenceInfo *fenceInfo)

Do fenced-inference. The call should return without waiting for inference to finish. The caller should prepare a FenceInfo structure and pass its address into this API. FenceFd in FenceInfo will be set, and the caller can be signaled when inference completes (or error exit) by waiting on the fence. Most importantly, after the fence is triggered, caller MUST call the callback in fenceInfo so that Neuron can perform certain post-execution tasks. The final execution status and inference time can be retrieved in FenceInfo after the callback is executed.

Note

This function is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • fenceInfo – The struct is used to receive the fence file descriptor and the post-inference callback in fenced execution.

Returns:

A Runtime error code.

5.4.2. Misc.h

file Misc.h

#include <sys/cdefs.h>

#include “Types.h”

Misc Neuron Runtime API

Miscellaneous functionality

Functions

int NeuronRuntime_getVersion(NeuronVersion *version)

Get the version of Neuron runtime library.

Note

Neuron runtime can only load DLA files generated by compiler with the same major version.

Parameters:

version – the version of Neuron runtime library.

Returns:

A RuntimeAPI error code.

5.4.3. RuntimeAPI.h

struct BufferAttribute

#include <RuntimeAPI.h>

BufferAttribute is used to inform the runtime whether this buffer is an ION buffer. If ionFd is -1, the buffer is a non-ION buffer. Otherwise, the buffer is an ION buffer and ionFd is its shared ION buffer file descriptor. Android device implementations may benefit from this information to eliminate unnecessary data copy.

Public Members

int ionFd

-1: Non-ION buffer.

struct EnvOptions

Public Members

uint32_t deviceKind

 

Device kind can be chosen from kEnvOptNullDevice, kEnvOptCModelDevice, or kEnvOptHardware.

For hardware development, use kEnvOptHardware.

 

MDLACoreMode MDLACoreOption

Set MDLA core option.

Warning

This option is no longer effective. To be removed in Neuron 6.0

uint8_t CPUThreadNum

Hint CPU backends to use #threads for execution.

bool suppressInputConversion

Set this to true to bypass preprocess and feed data in the format that the device demands.

bool suppressOutputConversion

Set this to true to bypass postprocess and retrieve raw device output.

file RuntimeAPI.h

#include “neuron/api/Types.h”

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

Neuron Runtime API

 

Neuron provides some APIs to create runtime environment, parse compiled model file, and do inference with a network.

The Runtime user should include this header to use Runtime API. Note that some APIs that set input and output info need the user to specify the handle of the input/output tensor that he/she wants to set.

The user may

1) Acts as ANN or TFLite, which always know the handle

2) Run a precompiled network. The user should understand the model in the beginning.

3) Run a precompiled network without knowing what the network look like. In this case, it is impossible for the user to do inference without taking a glance at the network IO map info.

Otherwise, the user cannot even give a valid input with valid input shape. After the user checks the IO map, they would also acquire the handle and the corresponding shape.

 

Defines

NON_ION_FD
 

Enums

enum MDLACoreMode

This option controls if the underlying hardware should split and run a graph across homogeneous devices. Note that this does not control the heterogeneous parallelism in the Runtime software.

Warning

This option is to be deprecated in Neuron 6.0

Values:

enumerator Auto

Scheduler decide.

enumerator Single

Force single MDLA.

enumerator Dual

Force multi MDLA.

Functions

inline int IsNullDevice(const EnvOptions *options)
Parameters:

options – The environment options for the Neuron Runtime.

Returns:

1 to indicate user-specified EnvOptions use a NullDevice. Otherwise, return 0.

inline int IsHardware(const EnvOptions *options)
Parameters:

options – The environment options for the Neuron Runtime.

Returns:

1 to indicate user-specified EnvOptions use real hardware. Otherwise, return 0.

int NeuronRuntime_create(const EnvOptions *optionsToDeprecate, void **runtime)

Create a Neuron Runtime based on the setting specified in options. The address of the created instance will be passed back in *runtime.

Parameters:
  • optionsToDeprecate – The environment options for the Neuron Runtime (To be deprecated).

  • runtime – Runtime provides API for applications to run a compiled network on specified input.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_create_with_options(const char *options, const EnvOptions *optionsToDeprecate, void **runtime)

Create a Neuron Runtime based on the setting specified in options. The address of the created instance will be passed back in *runtime.

Parameters:
  • options – The environment options for the Neuron Runtime.

  • optionsToDeprecate – The environment options for the Neuron Runtime (To be deprecated).

  • runtime – Runtime provides API for applications to run a compiled network on specified input.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_loadNetworkFromFile(void *runtime, const char *pathToDlaFile)

Load the compiled network from dla file.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • pathToDlaFile – The dla file path.

Returns:

A RuntimeAPI error code. 0 indicating load network successfully.

int NeuronRuntime_loadNetworkFromBuffer(void *runtime, const void *buffer, size_t size)

Load the compiled network from a memory buffer.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • buffer – The memory buffer.

  • size – The size of the buffer.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setInput(void *runtime, uint64_t handle, const void *buffer, size_t length, BufferAttribute attribute)

Set the memory buffer for the tensor which hold the specified input handle in the original network. If there are multiple inputs, each of them have to be set.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • buffer – The input buffer.

  • length – The input buffer size.

  • attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOffsetedInput(void *runtime, uint64_t handle, const void *buffer, size_t length, BufferAttribute attribute, size_t offset)

Set the memory buffer and offset for the tensor which hold the specified input handle in the original network. If there are multiple inputs, each of them have to be set.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • buffer – The input buffer.

  • length – The input buffer size.

  • attribute – The buffer attribute for setting ION.

  • offset – The offset for ION buffer.

  • offset – Reading ION buffer from start addr + offset.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setSingleInput(void *runtime, const void *buffer, size_t length, BufferAttribute attribute)

If there is only one input, this function can set the buffer to the input automatically. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • buffer – The input buffer.

  • length – The input buffer size.

  • attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setInputShape(void *runtime, uint64_t handle, uint32_t *dims, uint32_t rank)

Set shape for the input tensor which hold the specified input handle in the original network. If there are multiple inputs with dynamic shapes, each of them have to be set. This API is only used when input is dynamic shape, otherwise error code will be returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • dims – A array of dimension sizes for each dimension. For NHWC, dims[0] is N.

  • rank – The input rank. For exmaple, rank is 4 for NHWC.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOutput(void *runtime, uint64_t handle, void *buffer, size_t length, BufferAttribute attribute)

Set the memory buffer for the tensor which hold the specified output handle in the original network. If there are multiple outputs, each of them have to be set.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • buffer – The output buffer.

  • length – The output buffer size.

  • attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setOffsetedOutput(void *runtime, uint64_t handle, void *buffer, size_t length, BufferAttribute attribute, size_t offset)

Set the memory buffer and offset for the tensor which hold the specified output handle in the original network. If there are multiple outputs, each of them have to be set.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • buffer – The output buffer.

  • length – The output buffer size.

  • attribute – The buffer attribute for setting ION.

  • offset – Writing ION buffer from start addr + offset.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setSingleOutput(void *runtime, void *buffer, size_t length, BufferAttribute attribute)

If there is only one output, this function can set the buffer to the output automatically. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • buffer – The output buffer.

  • length – The output buffer size.

  • attribute – The buffer attribute for setting ION.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_setQoSOption(void *runtime, const QoSOptions *qosOption)

Set the QoS configuration for Neuron Runtime. If qosOption.profiledQoSData is not nullptr, Neuron Runtime would use it as the profiled QoS data.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • qosOption – The option for QoS configuration.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle). Pass back the expected buffer size (byte) in *size for the tensor which holds the specified input handle.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputRank(void *runtime, uint64_t handle, uint32_t *rank)

Get the rank required by the input tensor (specified by handle). Pass back the expected rank in *rank for the tensor which holds the specified input handle.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • rank – The input rank.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputSize(void *runtime, size_t *size)

If there is only one input, this function can get the physical size required by the buffer of input and return the expected buffer size (byte) in *size. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified input handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified input when suppressInputConversion is enabled.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputPaddedSize(void *runtime, size_t *size)

If there is only one input, this function passes back the expected size (byte) of its buffer in *size. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for input when suppressInputConversion is enabled. Otherwise, the returned value is NEURONRUNTIME_INCOMPLETE.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getInputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the input tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified input handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressInputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified input.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleInputPaddedDimensions(void *runtime, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the only input. This function passes back the expected size (in pixels) of each dimensions in *dim. The sizes of each dimensions in *dim have been aligned to hardware required sizes. If suppressInputConversion is enabled, the values in *dim are the required sizes of each dimensions for input. Otherwise NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle). This funxtion passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputSize(void *runtime, size_t *size)

Get the physical size required by the buffer of the only output. If there is only one Output, this function passes back the expected size (byte) of its buffer in *size. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified output when suppressOutputConversion is enabled.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputPaddedSize(void *runtime, size_t *size)

Get the physical size required by the buffer of the only output with hardware alignments. If there is only one Output, this function passes back the expected size (byte) of its buffer in *size. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for output when suppressOutputConversion is enabled. Otherwise, the returned value is NEURONRUNTIME_INCOMPLETE.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getOutputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the output tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified output handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressOutputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified output.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getSingleOutputPaddedDimensions(void *runtime, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the only output. If there is only one Output, this function passes back the expected size (in pixels) of each dimensions in *dim. The sizes of each dimensions in *dim have been aligned to hardware required sizes. If suppressOutputConversion is enabled, the values in *dim are the required sizes of each dimensions for output. Otherwise, NEURONRUNTIME_INCOMPLETE is returned.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getProfiledQoSData(void *runtime, ProfiledQoSData **profiledQoSData, uint8_t *execBoostValue)

Get the profiled QoS data and executing boost value (the actual boost value during execution). If *profiledQoSData is nullptr, Neuron Runtime would allocate *profiledQoSData. Otherwise, Neuron Runtime would only update its fields. *profiledQoSData is actually allocated as a smart pointer in Neuron Runtime instance, so the lifetime of *profiledQoSData is the same as Neuron Runtime. Caller should be careful about the usage of *profiledQoSData, and never touch the allocated *profiledQoSData after NeuronRuntime_release.

Note

This function is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • profiledQoSData – The profiled QoS raw data.

  • execBoostValue – The executing boost value (the actual boot value set in device) based on scheduling policy.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_inference(void *runtime)

Do inference.

Parameters:

runtime – The address of the created neuron runtime instance.

Returns:

A RuntimeAPI error code.

void NeuronRuntime_release(void *runtime)

Release the runtime resource.

Parameters:

runtime – The address of the created neuron runtime instance.

int NeuronRuntime_getVersion(NeuronVersion *version)

Get the version of Neuron runtime library.

Note

Neuron runtime can only load DLA files generated by compiler with the same major version.

Parameters:

version – the version of Neuron runtime library.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getMetadataInfo(void *runtime, const char *key, size_t *size)

Get metadata info in dla file, which is provided through compiler option &#8212;dla-metadata.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • key – The key for the target data

  • size – The size of the target data. If there is no corresponding metadata, size is 0.

Returns:

A RuntimeAPI error code.

int NeuronRuntime_getMetadata(void *runtime, const char *key, char *data, size_t size)

Get metadata in dla file, which is provided through compiler option &#8212;dla-metadata.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • key – The key for the target data

  • data – The destination data buffer.

  • size – The size to read from metadata.

Returns:

A RuntimeAPI error code.

Variables

const unsigned char kEnvOptNullDevice = 1 << 0

For unsigned char deviceKind.

const unsigned char kEnvOptCModelDevice = 1 << 1
 
const unsigned char kEnvOptHardware = 1 << 2
 
const unsigned char kEnvOptPredictor = 1 << 3
 

5.4.4. RuntimeV2.h

struct AsyncInferenceRequest

#include <RuntimeV2.h>

AsyncInferenceRequest represents a single inference request to be enqueued into Runtime Note that all the data pointed by pointers in AsyncInferenceRequest must remain valid until the inference of that request is complete.

Public Members

IOBuffer *inputs

A pointer to the array of input buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getInputNumber();

IOBuffer *outputs

A pointer to the array of output buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getOutputNumber();

void (*finish_cb)(uint64_t job_id, void *opaque, int status)

A callback function specified by the user for the runtime to notify inference complete. When it’s called, the ID of the job just have finished and the opaque pointer in the original request will be passed back in ‘job_id’ and ‘opaque’. The execution status is given by ‘status’. A zero status indicates success. Otherwise, the inference job has failed.

void *opaque

A pointer to an opaque data, which will be passed back when finish_cb is called.

struct IOBuffer

#include <RuntimeV2.h>

IOBufferis a descriptor describing the buffer which will be used as an inference input or output. Users should zero the whole IOBuffer, then fill those fields with valid data.

Public Members

void *buffer
 
size_t length
 
int fd
 
int offset
 
uint32_t reserved1_should_be_init_zero
 
uint64_t reserved2_should_be_init_zero
 
uint64_t reserved3_should_be_init_zero
 
struct SyncInferenceRequest

#include <RuntimeV2.h>

SyncInferenceRequest represents a synchronous inference request to run in the Runtime. The call will block until the inference finishes.

Public Members

IOBuffer *inputs

A pointer to the array of input buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getInputNumber();

IOBuffer *outputs

A pointer to the array of output buffer descriptions. The number of elements should equal to the result of NeuronRuntimeV2_getOutputNumber();

file RuntimeV2.h

#include “Types.h”

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

RuntimeV2.

NeuronRuntimeV2 API allows user to create a NeuronRuntimeV2 from the specified .DLA file. Users can enqueue asynchronous inference requests into the created runtime. Or, users can issue conventional synchronous requests, too.

Functions

int NeuronRuntimeV2_create(const char *pathToDlaFile, size_t nbThreads, void **runtime, size_t backlog = 2048u)

Create a NeuronRuntimeV2 based on the setting specified in options. It acts as a thread pool, waiting to accept AsyncInferenceRequest or SyncInferenceRequest on a DLA file. When the runtime receives a request, it enqueues the request into its backlog ring buffer, and the internal load balancer will dispatch the request to the appropriate thread for execution. However, there is no guarantee on the order of completion of AsyncInferenceRequest. The user-specified callback should be aware of this. SyncInferenceRequest, on the other hand, always block until the request finishes. The address of the created runtime instance will be passed back in *runtime.

Parameters:
  • pathToDlaFile – The DLA file path.

  • nbThreads – The number of threads in the runtime.

  • runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.

  • backlog – The maximum size of the backlog ring buffer. This should be smaller than 65536.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_create_with_options(const char *pathToDlaFile, size_t nbThreads, void **runtime, size_t backlog, const char *options)

Like NeuronRuntimeV2_create(), but it takes an additional option string.

Parameters:
  • pathToDlaFile – The DLA file path.

  • nbThreads – The number of threads in the runtime.

  • runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.

  • backlog – The maximum size of the backlog ring buffer. This should be smaller than 65536.

  • options – A null-terminated C-string specifying runtime options.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_createFromBuffer(const void *buffer, size_t len, size_t nbThreads, void **runtime, size_t backlog = 2048u)

Like NeuronRuntimeV2_create(), but it creates the Runtime instance from a memory buffer containing the DLA data.

Parameters:
  • buffer – The DLA data buffer.

  • len – The DLA data buffer size.

  • nbThreads – The number of threads in the runtime.

  • runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.

  • backlog – The maximum size of the backlog ring buffer. This should be smaller than 65536.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_createFromBuffer_with_options(const void *buffer, size_t len, size_t nbThreads, void **runtime, size_t backlog, const char *options)

Like NeuronRuntimeV2_createFromBuffer(), but it takes an additional option string. containing the DLA data.

Parameters:
  • buffer – The DLA data buffer.

  • len – The DLA data buffer size.

  • nbThreads – The number of threads in the runtime.

  • runtime – The pointer will be modified to the created NeuronRuntimeV2 instance on success.

  • backlog – The maximum size of the backlog ring buffer. This should be smaller than 65536.

  • options – A null-terminated C-string specifying runtime options.

Returns:

A RuntimeAPI error code.

void NeuronRuntimeV2_release(void *runtime)

Release the runtime. Calling this function will block until all requests finish.

Parameters:

runtime – The address of the created NeuronRuntimeV2 instance.

int NeuronRuntimeV2_enqueue(void *runtime, AsyncInferenceRequest request, uint64_t *job_id)

Enqueue one AsyncInferenceRequest. If the backlog ring buffer is not full, this function returns immediately, and the runtime will execute the request asynchronously. If the backlog is full (due to back pressure from execution), this call will block until the backlog ring buffer releases at least one available slot for the request. A unique ID is returned for the enqueued request in *job_id. The ID sequence starts from zero and increases with each received request. The 2^64 capacity for job ID should be enough for any applications.

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • request – The asynchronous inference request

  • job_id – The ID for this request is filled into *job_id. Later the ID will be passed back when the finish_cb is called.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_run(void *runtime, SyncInferenceRequest request)

Perform a synchronous inference request. The request will be also enqueued into the Runtime ring buffer as NeuronRuntimeV2_enqueue() does. However, the call will block until the request finishes.

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • request – The synchronous inference request

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputNumber(void *runtime, size_t *size)

Get the number of inputs of the model in the runtime. The number of inputs will be passed back in *size

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • size – The pointer to a size_t to store the passed back value.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputNumber(void *runtime, size_t *size)

Get the number of outputs of the model in the runtime. The number of outputs will be passed back in *size

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • size – The pointer to a size_t to store the passed back value.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputRank(void *runtime, uint64_t handle, uint32_t *rank)

Get the rank required by the input tensor (specified by handle). Pass back the expected rank in *rank for the tensor which holds the specified input handle.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • rank – The input rank.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle). Pass back the expected buffer size (byte) in *size for the tensor which holds the specified input handle.

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • handle – The frontend IO index.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle). This funxtion passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle.

Parameters:
  • runtime – The address of the created NeuronRuntimeV2 instance.

  • handle – The frontend IO index.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the input tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified input handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified input when suppressInputConversion is enabled.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The input buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getInputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the input tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified input handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressInputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified input.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputPaddedSize(void *runtime, uint64_t handle, size_t *size)

Get the physical size required by the buffer of the output tensor (specified by handle) with hardware alignments. This function passes back the expected buffer size (byte) in *size for the tensor which holds the specified output handle. The value in *size has been aligned to hardware required size, and it can be used as ION buffer size for the specified output when suppressOutputConversion is enabled.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • size – The output buffer size.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getOutputPaddedDimensions(void *runtime, uint64_t handle, RuntimeAPIDimensions *dims)

Get the size in pixels for each dimensions of the output tensor (specified by handle). This function passes back the expected size (in pixels) of each dimensions in *dim for the tensor which holds the specified output handle. The sizes of each dimensions in *dim have been aligned to hardware required sizes. When suppressOutputConversion is enabled, the values in *dim are the required sizes of each dimensions for the specified output.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • handle – The frontend IO index.

  • dims – The size (in pixels) of each dimensions.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_setQoSOption(void *runtime, const QoSOptions *qosOption)

Set the QoS configuration for Neuron Runtime. If qosOption.profiledQoSData is not null, Neuron Runtime would use it to store the profiled QoS data. *** Note : qosOption.profiledQoSData has no effect at all. *** Note : Using this API when NeuronRuntimeV2 is working leads to undefined behavior. Namely, this API should be used only when all requests have finished and no new request is being issued.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • qosOption – The option for QoS configuration.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getProfiledQoSData(void *runtime, ProfiledQoSData **profiledQoSData, uint8_t *execBoostValue)

Get the profiled QoS data and executing boost value (the actual boost value during execution). If *profiledQoSData is nullptr, Neuron Runtime would allocate *profiledQoSData. Otherwise, Neuron Runtime would only update its fields. *profiledQoSData is actually allocated as a smart pointer in Neuron Runtime instance, so the lifetime of *profiledQoSData is the same as Neuron Runtime. Caller should be careful about the usage of *profiledQoSData, and never touch the allocated *profiledQoSData after NeuronRuntime_release.

*** Note : Only effective when NeuronRuntimeV2 has nbThreads = 1. *** Note : Using this API when NeuronRuntimeV2 is working leads to undefined behavior. Namely, this API should be used only when all requests have finished and no new request is being issued.

Note

This function is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • profiledQoSData – The profiled QoS raw data.

  • execBoostValue – The executing boost value (the actual boot value set in device) based on scheduling policy.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getMetadataInfo(void *runtime, const char *key, size_t *size)

Get metadata info in dla file, which is provided through compiler option &#8212;dla-metadata.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • key – The key for the target data

  • size – The size of the target data. If there is no corresponding metadata, size is 0.

Returns:

A RuntimeAPI error code.

int NeuronRuntimeV2_getMetadata(void *runtime, const char *key, char *data, size_t size)

Get metadata in dla file, which is provided through compiler option &#8212;dla-metadata.

Parameters:
  • runtime – The address of the created neuron runtime instance.

  • key – The key for the target data

  • data – The destination data buffer.

  • size – The size to read from metadata.

Returns:

A RuntimeAPI error code.

5.4.5. Types.h

struct NeuronVersion

#include <Types.h>

The structure to represent the neuron version.

Public Members

uint8_t major
 
uint8_t minor
 
uint8_t patch
 
struct ProfiledQoSData

#include <Types.h>

Maintain the profiled QoS raw data.

Public Members

QoSData **qosData

 

Maintain profiled QoS raw data in a pointer of pointer.

This field could be nullptr if there is no previous profiled data.

 

uint32_t *numSubCmd

 

Number of sub-command in *qosData.

This field could be nullptr if there is no previous profiled data.

 

uint32_t numSubgraph

 

Number of subgraph.

This field should be zero if there is no previous profiled data.

 

struct QoSData

#include <Types.h>

Raw data for QoS configuration. All of those fields should be filled with the profiled data.

Note

This struct is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Public Members

uint64_t execTime

Profiled execution time : the profiled execution time (in usec).

uint32_t suggestedTime

Suggested time : the suggested time (in msec).

uint32_t bandwidth

Profled bandwidh : the profiled bandwidh (in MB/s).

uint8_t boostValue

Profiled boost value : the profiled executing boost value (range in 0 to 100).

struct QoSOptions

#include <Types.h>

QoS Option for configuration.

Public Members

RuntimeAPIQoSPreference preference

Execution preference

:

NEURONRUNTIME_PREFER_PERFORMANCE, NEURONRUNTIME_PREFER_POWER, or NEURONRUNTIME_TURBO_BOOST.

 

RuntimeAPIQoSPriority priority

Task priority

:

NEURONRUNTIME_PRIORITY_HIGH, NEURONRUNTIME_PRIORITY_MED, or NEURONRUNTIME_PRIORITY_LOW.

 

uint8_t boostValue

Boost value hint: hint for the device frequency, ranged between 0 (lowest) to 100 (highest). This value is the hint for baseline boost value in the scheduler, which sets the executing boost value (the actual boot value set in device) based on scheduling policy. For the inferences with preference set as NEURONRUNTIME_PREFER_PERFORMANCE, scheduler guarantees that the executing boost value would not be lower than the boost value hint. On the other hand, for the inferences with preference set as NEURONRUNTIME_PREFER_POWER, scheduler would try to save power by configuring the executing boost value with some value that is not higher than the boost value hint.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

uint8_t maxBoostValue

Maximum boost value: reserved.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

uint8_t minBoostValue

Minimum boost value: reserved.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

uint16_t deadline

Deadline: deadline for the inference (in msec). Setting any non-zero value would nofity the scheduler that this inference is a real-time task. This field should be zero, unless this inference is a real-time task.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

uint16_t abortTime

Abort time: the maximum inference time for the inference (in msec). If the inference is not completed before the abort time, the scheduler would abort the inference. This field should be zero, unless you wish to abort the inference.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

int32_t delayedPowerOffTime

Delayed power off time: delayed power off time after inference completed (in msec). Scheduler would start a timer for the time interval defined in delayed power off time after the inference completion. Once the delayed power off time expired and there is no other incoming inference requests, the underlying devices would be powered off for power-saving purpose. Set this field to NEURONRUNTIME_POWER_OFF_TIME_DEFAULT to use the default power off policy in the scheduler.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

RuntimeAPIQoSPowerPolicy powerPolicy

Power policy: configure power policy for scheduler.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

RuntimeAPIQoSAppType applicationType

Application type: hint for the application type for the inference.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

ProfiledQoSData *profiledQoSData

Profiled QoS Data: pointer to the historical QoS data of previous inferences. If there is no profiled data, this field could be nullptr. For the details, please check the ProfiledQoSData part.

Note

This member is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

struct RuntimeAPIDimensions

#include <Types.h>

The aligned sizes of dimensions.

Public Members

uint32_t dimensions[RuntimeAPIDimIndex::DimensionSize]
 
file Types.h

#include <stddef.h>

#include <stdint.h>

#include <sys/cdefs.h>

 

Common type definitions.

Enums

enum RuntimeAPIDimIndex

Values:

enumerator N

Batch dimension index.

enumerator H

Height dimension index.

enumerator W

Width dimension index.

enumerator C

Channel dimension index.

enumerator Invalid
 
enumerator DimensionSize

Dimension size.

enum RuntimeAPIQoSPreference

Execution preference.

Note

This enum is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Values:

enumerator NEURONRUNTIME_PREFER_PERFORMANCE

Prefer performance.

enumerator NEURONRUNTIME_PREFER_POWER

Prefer low power.

enumerator NEURONRUNTIME_HINT_TURBO_BOOST

Hint for turbo boost mode. Only valid for certain platforms (e.g., DX-1), For other platforms without turbo boost mode support, the behavior of NEURONRUNTIME_HINT_TURBO_BOOST would be identical to NEURONRUNTIME_PREFER_PERFORMANCE.

enum RuntimeAPIQoSPriority

Task priority.

Values:

enumerator NEURONRUNTIME_PRIORITY_LOW

Low priority.

enumerator NEURONRUNTIME_PRIORITY_MED

Medium priority.

enumerator NEURONRUNTIME_PRIORITY_HIGH

High priority.

enum RuntimeAPIQoSBoostValue

Special boost value hint.

Note

This enum is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Values:

enumerator NEURONRUNTIME_BOOSTVALUE_PROFILED

101: Hint to notify the scheduler to use the profiled boost value.

enumerator NEURONRUNTIME_BOOSTVALUE_MAX

100: Maximum boost value

enumerator NEURONRUNTIME_BOOSTVALUE_MIN

0: Minimum boost value

enum RuntimeAPIQoSDelayedPowerOffTime

Delayed power off time.

Note

This enum is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Values:

enumerator NEURONRUNTIME_POWER_OFF_TIME_DEFAULT

Default power off time.

enum RuntimeAPIQoSPowerPolicy

Power policy.

Note

This enum is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Values:

enumerator NEURONRUNTIME_POWER_POLICY_DEFAULT

Default policy.

enum RuntimeAPIQoSAppType

Application type.

Note

This enum is not supported on MediaTek TV platforms (MT58xx/MT76xx/MT90xx/MT96xx/MT99xx).

Values:

enumerator NEURONRUNTIME_APP_NORMAL

Normal type.

enum RuntimeAPIErrorCode

A Neuron Runtime API returns an error code to show the status of execution.

Values:

enumerator NEURONRUNTIME_NO_ERROR

0: The API is complete successfully.

enumerator NEURONRUNTIME_OUT_OF_MEMORY

1: Memory is not enough for the API.

enumerator NEURONRUNTIME_INCOMPLETE

2: Not in use.

enumerator NEURONRUNTIME_UNEXPECTED_NULL

3: A required pointer is null.

enumerator NEURONRUNTIME_BAD_DATA

4: Failed to load data or set input/output.

enumerator NEURONRUNTIME_BAD_STATE

5: Not in use.

enumerator NEURONRUNTIME_RUNTIME_ERROR

6: Hardware or simulator return unexpectedly.

5.5. OpenVX API Reference

Header for MediaTek extension to OpenVX.

Defines

VX_KERNEL_NAME_MEDIATEK_IMAGE_TO_TENSOR

Kernel name for image to tensor.

VX_KERNEL_NAME_MEDIATEK_TENSOR_TO_IMAGE

Kernel name for tensor to image.

VX_KERNEL_NAME_MEDIATEK_TFLITE

Kernel name for TFLite.

VX_KERNEL_NAME_MEDIATEK_REQUANTIZE

Kernel name for requantize.

VX_KERNEL_NAME_MEDIATEK_WEIGHTED_IMAGE_ADD

Kernel name for weighted_image_add.

VX_KERNEL_NAME_MEDIATEK_LABELING

Kernel name for labeling.

VX_KERNEL_NAME_MEDIATEK_SURFACE11x11

Kernel name for surface11x11.

VX_KERNEL_NAME_MEDIATEK_MULTIPLY_SCALAR

Kernel name for multiplyscalar.

VX_KERNEL_NAME_MEDIATEK_NORM

Kernel name for norm.

VX_KERNEL_NAME_MEDIATEK_ADD_SQUARED

Kernel name for add_squared.

VX_KERNEL_NAME_MEDIATEK_ROUND

Kernel name for round.

VX_KERNEL_NAME_MEDIATEK_FLIP

Kernel name for flip.

VX_KERNEL_NAME_MEDIATEK_TRANSPOSE

Kernel name for transpose.

VX_KERNEL_NAME_MEDIATEK_SUM

Kernel name for sum.

VX_KERNEL_NAME_MEDIATEK_BOX5x5

Kernel name for box5x5.

VX_KERNEL_NAME_MEDIATEK_BOX9x9

Kernel name for box9x9.

VX_KERNEL_NAME_MEDIATEK_SQRT

Kernel name for sqrt.

VX_KERNEL_NAME_MEDIATEK_EXP

Kernel name for exp.

VX_KERNEL_NAME_MEDIATEK_BLOCK_DOT_PRODUCT

Kernel name for block_dot_product.

VX_KERNEL_NAME_MEDIATEK_IMAGEBLENDING

Kernel name for imageblending.

VX_KERNEL_NAME_MEDIATEK_LOG

Kernel name for log.

VX_KERNEL_NAME_MEDIATEK_DIVIDE

Kernel name for divide.

VX_KERNEL_NAME_MEDIATEK_GAUSSIAN5x5

Kernel name for gaussianblur5x5.

VX_KERNEL_NAME_MEDIATEK_ROTATE_IMAGE

Kernel name for rotate_image.

VX_KERNEL_NAME_MEDIATEK_MATRIXMULTIPLY

Kernel name for matrixmultiply.

VX_KERNEL_NAME_MEDIATEK_CELLBASEDSUM

Kernel name for cellbasedsum.

VX_KERNEL_NAME_MEDIATEK_ADDWEIGHTED

Kernel name for addweighted.

VX_KERNEL_NAME_MEDIATEK_DILATENxN

Kernel name for dilateNxN.

VX_KERNEL_NAME_MEDIATEK_ERODENxN

Kernel name for erodeNxN.

VX_KERNEL_NAME_MEDIATEK_MEDIAN5x5

Kernel name for median5x5.

VX_KERNEL_NAME_MEDIATEK_HARRISCORNERSIMAGE

Kernel name for opencv’s harriscornersImage.

VX_KERNEL_NAME_MEDIATEK_UNPACK

Kernel name for unpack.

VX_KERNEL_NAME_MEDIATEK_BOUNDING_RECT

Kernel name for bounding_rect.

VX_KERNEL_NAME_MEDIATEK_CROSS_PRODUCT

Kernel name for cross_product.

VX_KERNEL_NAME_MEDIATEK_DOT

Kernel name for dot.

VX_KERNEL_NAME_MEDIATEK_HISTOGRAMS16

Kernel name for histograms16.

VX_KERNEL_NAME_MEDIATEK_GOODFEATURETOTRACK

Kernel name for goodfeaturetotrack.

VX_KERNEL_NAME_MEDIATEK_ESTIMATE_AFFINE

Kernel name for estimateAffine or ransacAffine.

VX_KERNEL_NAME_MEDIATEK_FASTCORNERSTENSOR

Kernel name for fastcornerstensor.

VX_KERNEL_NAME_MEDIATEK_OPTICALFLOWPYRLKTENSOR

Kernel name for opticalflowpyrlktensor.

VX_KERNEL_NAME_MEDIATEK_INVERSE

Kernel name for inverse.

VX_KERNEL_NAME_MEDIATEK_SPLIT2

Kernel name for split2.

VX_KERNEL_NAME_MEDIATEK_SPLIT3

Kernel name for split3.

VX_KERNEL_NAME_MEDIATEK_SPLIT4

Kernel name for split4.

VX_KERNEL_NAME_MEDIATEK_CROP

Kernel name for crop.

VX_KERNEL_NAME_MEDIATEK_TFLITE2INS

Kernel name for 2-inputs 1-output TFLite.

VX_KERNEL_NAME_MEDIATEK_TFLITE3INS3OUTS

Kernel name for 3-inputs 3-outputs TFLite.

VX_KERNEL_NAME_MEDIATEK_PACK

Kernel name for pack.

VX_KERNEL_NAME_MEDIATEK_CBCR_SWAP

Kernel name for cbcr swap.

VX_KERNEL_NAME_MEDIATEK_PYRUP

Kernel name for pyrup.

VX_KERNEL_NAME_MEDIATEK_MACRO_BLOCK

Kernel name for macro block.

VX_KERNEL_NAME_MEDIATEK_CONNECTED_COMPONENT

Kernel name for connected component.

VX_LIBRARY_MEDIATEK_APU

Mediatek APU extension library set.

VX_LIBRARY_MEDIATEK_APU_CUST

Mediatek APU customer’s extension library set.

VX_LIBRARY_MEDIATEK_APU_TFLITE

Mediatek APU tflite extension library set.

VX_TYPE_MTK_JSON

New array item type of char used to carry OP options in json format.

VX_ID_MEDIATEK_APU

Mediatek APU extension supported enumeration.

Enums

enum vx_kernel_mediatek_ext_e

The list of MediaTek Kernels.

Values:

enumerator VX_KERNEL_MEDIATEK_IMAGE_TO_TENSOR

Image to tensor kernel.

enumerator VX_KERNEL_MEDIATEK_TENSOR_TO_IMAGE

Tensor to image kernel.

enumerator VX_KERNEL_MEDIATEK_TFLITE

TFLite kernel.

enumerator VX_KERNEL_MEDIATEK_REQUANTIZE

Requantize.

enumerator VX_KERNEL_MEDIATEK_WEIGHTED_IMAGE_ADD

WeightedImageAdd.

enumerator VX_KERNEL_MEDIATEK_LABELING

Labeling.

enumerator VX_KERNEL_MEDIATEK_SURFACE11x11

Surface11x11.

enumerator VX_KERNEL_MEDIATEK_MULTIPLY_SCALAR

MultiplyScalar.

enumerator VX_KERNEL_MEDIATEK_NORM

Norm.

enumerator VX_KERNEL_MEDIATEK_ADD_SQUARED

AddSquared.

enumerator VX_KERNEL_MEDIATEK_ROUND

Round.

enumerator VX_KERNEL_MEDIATEK_FLIP

Flip.

enumerator VX_KERNEL_MEDIATEK_TRANSPOSE

Transpose.

enumerator VX_KERNEL_MEDIATEK_SUM

Sum.

enumerator VX_KERNEL_MEDIATEK_BOX5x5

Box5x5.

enumerator VX_KERNEL_MEDIATEK_BOX9x9

Box9x9.

enumerator VX_KERNEL_MEDIATEK_SQRT

Sqrt.

enumerator VX_KERNEL_MEDIATEK_EXP

Exp.

enumerator VX_KERNEL_MEDIATEK_BLOCK_DOT_PRODUCT

BlockDotProduct.

enumerator VX_KERNEL_MEDIATEK_IMAGEBLENDING

ImageBlending.

enumerator VX_KERNEL_MEDIATEK_LOG

Log.

enumerator VX_KERNEL_MEDIATEK_DIVIDE

Divide.

enumerator VX_KERNEL_MEDIATEK_GAUSSIAN5x5

Gaussian5x5.

enumerator VX_KERNEL_MEDIATEK_ROTATE_IMAGE

RotateImage.

enumerator VX_KERNEL_MEDIATEK_MATRIXMULTIPLY

MatrixMultiply.

enumerator VX_KERNEL_MEDIATEK_CELLBASEDSUM

cellbasedsum

enumerator VX_KERNEL_MEDIATEK_ADDWEIGHTED

AddWeighted.

enumerator VX_KERNEL_MEDIATEK_DILATENxN

DilateNxN.

enumerator VX_KERNEL_MEDIATEK_ERODENxN

ErodeNxN.

enumerator VX_KERNEL_MEDIATEK_MEDIAN5x5

Median5x5.

enumerator VX_KERNEL_MEDIATEK_HARRISCORNERSIMAGE

HarrisCornersImage.

enumerator VX_KERNEL_MEDIATEK_UNPACK

Unpack.

enumerator VX_KERNEL_MEDIATEK_BOUNDING_RECT

BoundingRect.

enumerator VX_KERNEL_MEDIATEK_CROSS_PRODUCT

CrossProduct.

enumerator VX_KERNEL_MEDIATEK_DOT

Dot.

enumerator VX_KERNEL_MEDIATEK_HISTOGRAMS16

histograms16

enumerator VX_KERNEL_MEDIATEK_GOODFEATURETOTRACK

goodfeaturetotrack

enumerator VX_KERNEL_MEDIATEK_ESTIMATE_AFFINE

EstimateAffine.

enumerator VX_KERNEL_MEDIATEK_FASTCORNERSTENSOR

fastcornerstensor

enumerator VX_KERNEL_MEDIATEK_OPTICALFLOWPYRLKTENSOR

opticalflowpyrlktensor

enumerator VX_KERNEL_MEDIATEK_INVERSE

inverse

enumerator VX_KERNEL_MEDIATEK_SPLIT2

split2

enumerator VX_KERNEL_MEDIATEK_SPLIT3

split3

enumerator VX_KERNEL_MEDIATEK_SPLIT4

split4

enumerator VX_KERNEL_MEDIATEK_CROP

crop

enumerator VX_KERNEL_MEDIATEK_TFLITE2INS

TFLite 2ins1out.

enumerator VX_KERNEL_MEDIATEK_TFLITE3INS3OUTS

TFLite 3ins3outs.

enumerator VX_KERNEL_MEDIATEK_PACK

pack

enumerator VX_KERNEL_MEDIATEK_CBCR_SWAP

cbcr swap

enumerator VX_KERNEL_MEDIATEK_PYRUP

pyrup

enumerator VX_KERNEL_MEDIATEK_MACRO_BLOCK

macro block

enumerator VX_KERNEL_MEDIATEK_CONNECTED_COMPONENT

connected component

enumerator VX_KERNEL_CUSTOM_CONVERTER

Generalized converter for custom nodes. The new added custom nodes should be started from 0x1.

enumerator VX_KERNEL_MEDIATEK_TFLITE_BASE

A base enum for dynamicly adding MIMO tflite kernels. Do NOT add kernels in this library.

enum vx_enum_mediatek_e

The set of supported enumerations in MediaTek OpenVX.

These can be extracted from enumerated values using VX_ENUM_TYPE.

Values:

enumerator VX_ENUM_MEDIATEK_NORM_TYPE

A norm type.

enum vx_mediatek_norm_type_e

A normalization type.

 

See also

group_vision_function_canny

 

Values:

enumerator VX_MEDIATEK_NORM_INF

The INF normalization.

enumerator VX_MEDIATEK_NORM_L1

The L1 normalization.

enumerator VX_MEDIATEK_NORM_L2

The L2 normalization.

enumerator VX_MEDIATEK_NORM_L2SQR

The L2SQR normalization.

enumerator VX_MEDIATEK_NORM_HAMMING

The HAMMING normalization.

Functions

vx_node mvxImageToTensorNode(vx_graph graph, vx_image input, vx_tensor output)

[Graph] MediaTek provided node which reshapes an image to a tensor.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output tensor.

vx_node mvxTensorToImageNode(vx_graph graph, vx_tensor input, vx_image output)

[Graph] MediaTek provided node which reshapes a tensor to an image.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output image.

vx_node mvxTFLiteNode(vx_graph graph, vx_tensor input, vx_float32 in_scale, vx_int32 in_zeroPoint, vx_array tflite, vx_float32 out_scale, vx_int32 out_zeroPoint, vx_tensor output)

[Graph] MediaTek provided node which takes a tflite buffer as a neural network model.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • in_scale – [in] Scale value in the TFLite quantization parameters of input tensor.

  • in_zeroPoint – [in] Zero point in the TFLite quantization parameters of input tensor.

  • tflite – [in] A TFLite buffer stored in vx_array of VX_TYPE_CHAR

  • out_scale – [in] Scale value in the TFLite quantization parameters of output tensor.

  • out_zeroPoint – [in] Zero point in the TFLite quantization parameters of output tensor.

  • output – [out] The output tensor.

vx_node mvxRequantizeNode(vx_graph graph, vx_tensor input, vx_float32 scale, vx_int32 zeroPoint, vx_tensor output)

[Graph] MediaTek provided node which performs a requantize operation on the input tensor based on the given quantization paramters. Refer to the description from TFLite quantized tensors. https://www.tensorflow.org/lite/performance/post_training_quantization#representation_for_quantized_tensors

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • scale – [in] Scale parameter of TFLite quantized tensors.

  • zeroPoint – [in] Zero point parameter of TFLite quantized tensors.

  • output – [out] The output tensor.

vx_node mvxWeightedImageAddNode(vx_graph graph, vx_image input1, vx_image input2, vx_image ratio, vx_image output)

[Graph] MediaTek provided node which do weight image add.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • ratio – [in] The ratio image.

  • output – [out] The output image.

vx_status mvxuWeightedImageAdd(vx_context context, vx_image input1, vx_image input2, vx_image ratio, vx_image output)

[Immediate] Immediate mode version of mvxWeightedImageAddNode.

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • ratio – [in] The ratio image.

  • output – [out] The output image.

vx_node mvxLabelingNode(vx_graph graph, vx_image input, vx_int8 connectivity, vx_image output, vx_image lut)

[Graph] MediaTek provided node which do labeling.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • connectivity – [in] .

  • output – [out] The output image.

  • lut – [out] The ouptut Look-Up Table.

vx_status mvxuLabeling(vx_context context, vx_image input, vx_int8 connectivity, vx_image output, vx_image lut)

[Immediate] Immediate mode version of mvxLabelingNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • connectivity – [in] .

  • output – [out] The output image.

  • lut – [out] The output Look-Up Table.

vx_node mvxSurface11x11Node(vx_graph graph, vx_image input, vx_int8 y, vx_image output)

[Graph] MediaTek provided node which performs a surface filter with window size 11 on the input.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • y – [in] Threshold value.

  • output – [out] The output image.

vx_status mvxuSurface11x11(vx_context context, vx_image input, vx_int8 y, vx_image output)

[Immediate] Immediate mode version of mvxSurface11x11Node.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • y – [in] Threshold value.

  • output – [out] The output image.

vx_node mvxMultiplyScalarNode(vx_graph graph, vx_image input, vx_int16 scalar, vx_uint8 shift, vx_enum policy, vx_image output)

[Graph] MediaTek provided node which performs image mulplication with a scalar value

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • scalar – [in] The scalar value.

  • shift – [in] The right shift value.

  • policy – [in] The overflow policy.

  • output – [out] The output image.

vx_status mvxuMultiplyScalar(vx_context context, vx_image input, vx_int16 scalar, vx_uint8 shift, vx_enum policy, vx_image output)

[Immediate] Immediate mode version of mvxMultiplyScalarNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • scalar – [in] The scalar value.

  • shift – [in] The right shift value.

  • policy – [in] The overflow policy.

  • output – [out] The output image.

vx_node mvxNormNode(vx_graph graph, vx_image input1, vx_image input2, vx_enum type, vx_image output)

[Graph] MediaTek provided node which performs image normalization

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • type – [in] The normalization type.

  • output – [out] The output image.

vx_status mvxuNorm(vx_context context, vx_image input1, vx_image input2, vx_enum type, vx_image output)

[Immediate] Immediate mode version of mvxNormNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • type – [in] The normalization type.

  • output – [out] The output image.

vx_node mvxAddSquaredNode(vx_graph graph, vx_image input1, vx_image input2, vx_int8 shift, vx_image output)

[Graph] MediaTek provided node which performs add with squared input

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image which will be squared.

  • input2 – [in] The input image.

  • shift – [in] The right shift value (< 0 for left shift).

  • output – [out] The output image.

vx_status mvxuAddSquared(vx_context context, vx_image input1, vx_image input2, vx_int8 shift, vx_image output)

[Immediate] Immediate mode version of mvxAddSquaredNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input1 – [in] The input image which will be squared.

  • input2 – [in] The input image.

  • shift – [in] The right shift value (< 0 for left shift).

  • output – [out] The output image.

vx_node mvxRoundNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxAddSquaredNode.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuRound(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxAddSquaredNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_node mvxFlipNode(vx_graph graph, vx_image input, vx_int8 flip_code, vx_image output)

[Graph] MediaTek provided node which performs flip

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_status mvxuFlip(vx_context context, vx_image input, vx_int8 flip_code, vx_image output)

[Immediate] Immediate mode version of mvxFlipNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_node mvxTransposeNode(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs transpose

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_status mvxuTranspose(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxTransposeNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node mvxSumNode(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs image sum

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_status mvxuSum(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxSumNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node vxBox5x5Node(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided a extended Box Filter node which supports kernel 5x5

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image in VX_DF_IMAGE_U8 format.

  • output – [out] The output image in VX_DF_IMAGE_U8 format, which must have the same dimensions as the input image.

Return values:

vx_node – A node reference. Any possible errors preventing a successful creation should be checked using vxGetStatus

Returns:

vx_node.

vx_status mvxuBox5x5(vx_context context, vx_image input, vx_image output)

[Immediate] Computes a box filter on the image by a 5x5 window.

Parameters:
  • context – [in] The reference to the overall context.

  • input – [in] The input image in VX_DF_IMAGE_U8 format.

  • output – [out] The output image in VX_DF_IMAGE_U8 format.

Return values:
  • VX_SUCCESS – Success

  • * – An error occurred. See vx_status_e.

Returns:

vx_status_e enumeration.

vx_node vxBox9x9Node(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided a extended Box Filter node which supports kernel 9x9

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image in VX_DF_IMAGE_U8 format.

  • output – [out] The output image in VX_DF_IMAGE_U8 format, which must have the same dimensions as the input image.

Return values:

vx_node – A node reference. Any possible errors preventing a successful creation should be checked using vxGetStatus

Returns:

vx_node.

vx_status mvxuBox9x9(vx_context context, vx_image input, vx_image output)

[Immediate] Computes a box filter on the image by a 9x9 window.

Parameters:
  • context – [in] The reference to the overall context.

  • input – [in] The input image in VX_DF_IMAGE_U8 format.

  • output – [out] The output image in VX_DF_IMAGE_U8 format.

Return values:
  • VX_SUCCESS – Success

  • * – An error occurred. See vx_status_e.

Returns:

vx_status_e enumeration.

vx_node mvxSqrtNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Graph] MediaTek provided node which performs sqrt

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_status mvxuSqrt(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxuSqrt.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_node mvxExpNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Graph] MediaTek provided node which performs exp

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_status mvxuExp(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxuExp.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • flip_code – [in] A flag to specify how to flip.

  • output – [out] The output image.

vx_node mvxBlockDotProductNode(vx_graph graph, vx_image input1, vx_image input2, vx_image output)

[Graph] MediaTek provided node which performs image block dot product.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • output – [out] The output image.

vx_status mvxuBlockDotProduct(vx_context context, vx_image input1, vx_image input2, vx_image output)

[Immediate] Immediate mode version of mvxBlockDotProductNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • output – [out] The output image.

vx_node mvxImageBlendingNode(vx_graph graph, vx_image input1, vx_image input2, vx_image input_map, vx_int32 mapsum, vx_image output)

[Graph] MediaTek provided node which do weight image add.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • input_map – [in] The ratio image.

  • mapsum – [in]

  • output – [out] The output image.

vx_status mvxuImageBlending(vx_context context, vx_image input1, vx_image input2, vx_image input_map, vx_int32 mapsum, vx_image output)

[Immediate] Immediate mode version of mvxImageBlendingNode.

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • input_map – [in] The ratio image.

  • mapsum – [in]

  • output – [out] The output image.

vx_node mvxLogNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxLogNode.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuLog(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxLogNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_node mvxDivideNode(vx_graph graph, vx_image input1, vx_image input2, vx_bool is_quantize, vx_image output)

[Graph] MediaTek provided node which do divide.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • is_quantize – [in]

  • output – [out] The output image.

vx_status mvxuDivide(vx_context context, vx_image input1, vx_image input2, vx_bool is_quantize, vx_image output)

[Immediate] Immediate mode version of mvxDivideNode.

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • is_quantize – [in]

  • output – [out] The output image.

vx_node mvxGaussian5x5Node(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs Gaussian5x5

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuGaussian5x5(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxGaussian5x5.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node mvxRotateImageNode(vx_graph graph, vx_image input, vx_int8 rotate_code, vx_image output)

[Graph] MediaTek provided node which performs rotate

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • rotate_code – [in] A flag to specify how to rotate.

  • output – [out] The output image.

vx_status mvxuRotateImage(vx_context context, vx_image input, vx_int8 rotate_code, vx_image output)

[Immediate] Immediate mode version of mvxRotateImageNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • rotate_code – [in] A flag to specify how to rotate.

  • output – [out] The output image.

vx_node mvxMatrixMultiplyNode(vx_graph graph, vx_image input1, vx_image input2, vx_image output)

[Graph] MediaTek provided node which performs mvxMatrixMultiplyNode

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • output – [out] The output image.

vx_status mvxuMatrixMultiply(vx_context context, vx_image input1, vx_image input2, vx_image output)

[Immediate] Immediate mode version of mvxuMatrixMultiply

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input image.

  • input2 – [in] The input image.

  • output – [out] The output image.

vx_node mvxAddWeightedNode(vx_graph graph, vx_image input1, vx_image input2, vx_float32 alpha, vx_float32 beta, vx_float32 gamma, vx_image output)

[Graph] MediaTek provided node which performs addweighted

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The first input image.

  • input – [in] The second input image.

  • input – [in] alpha.

  • input – [in] beta.

  • input – [in] gamma.

  • output – [out] The output image.

vx_status mvxuAddWeighted(vx_context contect, vx_image input1, vx_image input2, vx_float32 alpha, vx_float32 beta, vx_float32 gamma, vx_image output)

[Immediate] Immediate mode version of mvxuAddWeighted

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The first input image.

  • input – [in] The second input image.

  • input – [in] alpha.

  • input – [in] beta.

  • input – [in] gamma.

  • output – [out] The output image.

vx_node mvxDilateNxNNode(vx_graph graph, vx_image input, vx_int8 kernel_size, vx_image output)

[Graph] MediaTek provided node which performs Dilate5x5

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • kernel_size – [in] The kernel size.

  • output – [out] The output image.

vx_status mvxuDilateNxN(vx_context context, vx_image input, vx_int8 kernel_size, vx_image output)

[Immediate] Immediate mode version of mvxDilate5x5Node.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • kernel_size – [in] The kernel size.

  • output – [out] The output image.

vx_node mvxErodeNxNNode(vx_graph graph, vx_image input, vx_int8 kernel_size, vx_image output)

[Graph] MediaTek provided node which performs Erode5x5

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • kernel_size – [in] The kernel size.

  • output – [out] The output image.

vx_status mvxuErodeNxN(vx_context context, vx_image input, vx_int8 kernel_size, vx_image output)

[Immediate] Immediate mode version of mvxErode5x5Node.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • kernel_size – [in] The kernel size.

  • output – [out] The output image.

vx_node mvxMedian5x5Node(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs Median5x5

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_status mvxuMedian5x5(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxMedian5x5Node.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node mvxCellBasedSumNode(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs CellBasedSum

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_status mvxuCellBasedSum(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxCellBasedSumNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node mvxHarrisCornersImageNode(vx_graph graph, vx_image input, vx_image output)

[Graph] MediaTek provided node which performs mvxHarrisCornersImage

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_status mvxuHarrisCornersImage(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxuHarrisCornersImage

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input image.

  • output – [out] The output image.

vx_node mvxUnpackNode(vx_graph graph, vx_image input, vx_bool is_low, vx_enum unpack_mode, vx_image output)

[Graph] MediaTek provided node which performs Unpack

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • if – [in] bits unpack to take low part.

  • unpack – [in] mode, 0: 8->16, 1: 10->16, 2: 12->16, 3: 14->16

  • output – [out] The output tensor.

vx_status mvxuUnpack(vx_context context, vx_image input, vx_bool is_low, vx_enum unpack_mode, vx_image output)

[Immediate] Immediate mode version of mvxUnpack.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input image.

  • if – [in] bits unpack to take low part.

  • unpack – [in] mode, 0: 8->16, 1: 10->16, 2: 12->16, 3: 14->16

  • output – [out] The output image.

vx_node mvxBoundingRectNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of BoundingRect

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuBoundingRect(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxuBoundingRect

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input tensor.

  • output – [out] The output tensor.

vx_node mvxCrossProductNode(vx_graph graph, vx_tensor input1, vx_tensor input2, vx_tensor output)

[Immediate] Immediate mode version of CrossProduct

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input tensor.

  • input2 – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuCrossProduct(vx_context context, vx_tensor input1, vx_tensor input2, vx_tensor output)

[Immediate] Immediate mode version of mvxuCrossProduct

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input tensor.

  • input2 – [in] The input tensor.

  • output – [out] The output tensor.

vx_node mvxDotNode(vx_graph graph, vx_tensor input1, vx_tensor input2, vx_tensor output)

[Immediate] Immediate mode version of Dot

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input tensor.

  • input2 – [in] The input tensor.

  • output – [out] The output tensor.

vx_status mvxuDot(vx_context context, vx_tensor input1, vx_tensor input2, vx_tensor output)

[Immediate] Immediate mode version of mvxuDot

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input tensor.

  • input2 – [in] The input tensor.

  • output – [out] The output tensor.

vx_node mvxHistogramS16Node(vx_graph graph, vx_image input, vx_tensor output)

[Immediate] Immediate mode version of Histograms16

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • output – [out] The output tensor.

vx_status mvxuHistogramS16(vx_context context, vx_image input, vx_tensor output)

[Immediate] Immediate mode version of mvxuHistograms16

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • output – [out] The output tensor.

vx_node mvxGoodFeatureToTrackNode(vx_graph graph, vx_image input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of GoodFeatureToTrack

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input vx_image.

  • output – [out] The output0 tensor.

  • output – [out] The output1 tensor.

vx_status mvxuGoodFeatureToTrack(vx_context context, vx_image input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of mvxuGoodFeatureToTrack

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input vx_image.

  • output – [out] The output0 tensor.

  • output – [out] The output1 tensor.

vx_node mvxEstimateAffineNode(vx_graph graph, vx_array input_from_points, vx_array input_to_points, vx_array output_model, vx_int32 data_size, vx_enum method, vx_int32 min_num_inliers, vx_int32 max_iters, vx_float32 threshold_dist, vx_float32 threshold_mme)

[Graph] MediaTek provided node which performs estimate_affine

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input_from_points – [in] The point set of the original image.

  • input_to_points – [in] The point set of the transformed image.

  • output_model – [out] The output model.

  • data_size. – [in]

  • method – [in] The method to estimate affine.

  • min_num_inliers – [in] .

  • max_iters – [in] .

  • threshold_dist – [in] .

  • threshold_mme – [in] .

vx_status mvxuEstimateAffine(vx_context context, vx_array input_from_points, vx_array input_to_points, vx_array output_model, vx_int32 data_size, vx_enum method, vx_int32 min_num_inliers, vx_int32 max_iters, vx_float32 threshold_dist, vx_float32 threshold_mme)

[Immediate] Immediate mode version of mvxEstimateAffineNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input_from_points – [in] The point set of the original image.

  • input_to_points – [in] The point set of the transformed image.

  • output_model – [out] The output model.

  • data_size. – [in]

  • method – [in] The method to estimate affine.

  • min_num_inliers – [in] .

  • max_iters – [in] .

  • threshold_dist – [in] .

  • threshold_mme – [in] .

vx_node mvxFastCornersTensorNode(vx_graph graph, vx_image input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of FastCornersTensor

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The input vx_image.

  • output – [out] The output0 tensor.

  • output – [out] The output1 tensor.

vx_status mvxuFastCornersTensor(vx_context context, vx_image input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of mvxuFastCornersTensor

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input1 – [in] The input vx_image.

  • output – [out] The output0 tensor.

  • output – [out] The output1 tensor.

vx_node mvxOpticalFlowPyrLKTensorNode(vx_graph graph, vx_image input0, vx_image input1, vx_tensor input2, vx_tensor input3, vx_tensor Output0, vx_tensor Output1)

[Immediate] Immediate mode version of OpticalFlowPyrLKTensor

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input0 – [in] The input vx_image.

  • input1 – [in] The input vx_image.

  • input2 – [in] The input tensor.

  • input3 – [in] The input tensor.

  • output0 – [out] The output tensor.

  • output1 – [out] The output tensor.

vx_status mvxuOpticalFlowPyrLKTensor(vx_context context, vx_image input0, vx_image input1, vx_tensor input2, vx_tensor input3, vx_tensor Output0, vx_tensor Output1)

[Immediate] Immediate mode version of mvxuOpticalFlowPyrLKTensor

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input0 – [in] The input vx_image.

  • input1 – [in] The input vx_image.

  • input2 – [in] The input tensor.

  • input3 – [in] The input tensor.

  • output0 – [out] The output tensor.

  • output1 – [out] The output tensor.

vx_node mvxInverseNode(vx_graph graph, vx_tensor input, vx_int8 decomp_type, vx_tensor output)

[Graph] MediaTek provided node which performs matrix inverse

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input tensor.

  • decomp_type – [in] A flag to specify how to decomposite matrix.

  • output – [out] The output tensor.

vx_status mvxuInverse(vx_context context, vx_tensor input, vx_int8 decomp_type, vx_tensor output)

[Immediate] Immediate mode version of mvxInverseNode.

Parameters:
  • context – [in] The overall context of the implementation.

  • input – [in] The input tensor.

  • decomp_type – [in] A flag to specify how to decomposite matrix.

  • output – [out] The output tensor.

vx_node mvxSplit2Node(vx_graph graph, vx_tensor input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of Split to 2 channel

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

vx_status mvxuSplit2(vx_context context, vx_tensor input, vx_tensor output0, vx_tensor output1)

[Immediate] Immediate mode version of mvxuSplit2

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

vx_node mvxSplit3Node(vx_graph graph, vx_tensor input, vx_tensor output0, vx_tensor output1, vx_tensor output2)

[Immediate] Immediate mode version of Split to 3 channel

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

  • output2 – [out] The output vx_tensor.

vx_status mvxuSplit3(vx_context context, vx_tensor input, vx_tensor output0, vx_tensor output1, vx_tensor output2)

[Immediate] Immediate mode version of mvxuSplit3

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

  • output2 – [out] The output vx_tensor.

vx_node mvxSplit4Node(vx_graph graph, vx_tensor input, vx_tensor output0, vx_tensor output1, vx_tensor output2, vx_tensor output3)

[Immediate] Immediate mode version of Split to 4 channel

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

  • output2 – [out] The output vx_tensor.

  • output3 – [out] The output vx_tensor.

vx_status mvxuSplit4(vx_context context, vx_tensor input, vx_tensor output0, vx_tensor output1, vx_tensor output2, vx_tensor output3)

[Immediate] Immediate mode version of mvxuSplit4

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output0 – [out] The output vx_tensor.

  • output1 – [out] The output vx_tensor.

  • output2 – [out] The output vx_tensor.

  • output3 – [out] The output vx_tensor.

vx_node mvxCropNode(vx_graph graph, vx_image input, vx_int32 crop_w_offset, vx_int32 crop_h_offset, vx_int32 crop_width, vx_int32 crop_height, vx_image output)

[Immediate] Immediate mode version of Crop

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • crop_w_offset – [in] The Crop width offset

  • crop_h_offset – [in] The Crop high offset

  • crop_width – [in] Number of columns in source image

  • crop_height – [in] Number of rows in source image

  • output – [out] The output vx_image.

vx_status mvxuCrop(vx_context context, vx_image input, vx_int32 crop_w_offset, vx_int32 crop_h_offset, vx_int32 crop_width, vx_int32 crop_height, vx_image output)

[Immediate] Immediate mode version of mvxuCrop

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • crop_w_offset – [in] The Crop width offset

  • crop_h_offset – [in] The Crop high offset

  • crop_width – [in] Number of columns in source image

  • crop_height – [in] Number of rows in source image

  • output – [out] The output vx_tensor.

vx_node mvxTFLite2InsNode(vx_graph graph, vx_tensor input1, vx_tensor input2, vx_array in_scales, vx_array in_zeroPoints, vx_array tflite, vx_float32 out_scale, vx_int32 out_zeroPoint, vx_tensor output)

[Graph] MediaTek provided node which takes a tflite buffer as a neural network model.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The first input tensor.

  • input2 – [in] The second input tensor.

  • in_scales – [in] Scale values in the TFLite quantization parameters of input tensor.

  • in_zeroPoints – [in] Zero points in the TFLite quantization parameters of input tensor.

  • tflite – [in] A TFLite buffer stored in vx_array of VX_TYPE_CHAR

  • out_scales – [in] Scale values in the TFLite quantization parameters of output tensors.

  • out_zeroPoints – [in] Zero points in the TFLite quantization parameters of output tensors.

  • output – [out] The output tensor.

vx_node mvxTFLite3Ins3OutsNode(vx_graph graph, vx_tensor input1, vx_tensor input2, vx_tensor input3, vx_array in_scales, vx_array in_zeroPoints, vx_array tflite, vx_array out_scales, vx_array out_zeroPoints, vx_tensor output1, vx_tensor output2, vx_tensor output3)

[Graph] MediaTek provided node which takes a tflite buffer as a neural network model.

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input1 – [in] The first input tensor.

  • input2 – [in] The second input tensor.

  • input3 – [in] The third input tensor.

  • in_zeroPoints – [in] Zero points in the TFLite quantization parameters of input tensors.

  • tflite – [in] A TFLite buffer stored in vx_array of VX_TYPE_CHAR

  • out_scales – [in] Scale values in the TFLite quantization parameters of output tensors.

  • out_zeroPoints – [in] Zero points in the TFLite quantization parameters of output tensors.

  • output1 – [out] The first output tensor.

  • output2 – [out] The second output tensor.

  • output3 – [out] The third output tensor.

vx_node mvxPackNode(vx_graph graph, vx_image input, vx_bool is_low, vx_enum pack_mode, vx_image output)

[Immediate] Immediate mode version of Pack

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • if – [in] bits pack to take low part.

  • pack_mode – [in] Pack mode. 1: 16->10, 2: 16->12

  • output – [out] The output vx_image.

vx_status mvxuPack(vx_context context, vx_image input, vx_bool is_low, vx_enum pack_mode, vx_image output)

[Immediate] Immediate mode version of mvxuPack

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • if – [in] bits pack to take low part.

  • pack_mode – [in] Pack mode. 1: 16->10, 2: 16->12

  • output – [out] The output vx_image.

vx_node mvxCbcrSwapNode(vx_graph graph, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of CbcrSwap

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output – [out] The output vx_tensor.

vx_status mvxuCbcrSwap(vx_context context, vx_tensor input, vx_tensor output)

[Immediate] Immediate mode version of mvxuCbcrSwap

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_tensor.

  • output – [out] The output vx_tensor.

vx_node mvxPyrUpNode(vx_graph graph, vx_image input, vx_image output)

[Immediate] Immediate mode version of PyrUp

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • output – [out] The output vx_image.

vx_status mvxuPyrUp(vx_context context, vx_image input, vx_image output)

[Immediate] Immediate mode version of mvxuPyrUp

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • output – [out] The output vx_image.

vx_node mvxMacroBlockNode(vx_graph graph, vx_image input, vx_tensor threshold, vx_image output)

[Immediate] Immediate mode version of Macro Block

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • tensor – [in] for threshold value

  • output – [out] The output vx_image.

vx_status mvxuMacroBlock(vx_context context, vx_image input, vx_tensor threshold, vx_image output)

[Immediate] Immediate mode version of mvxuMacroBlock

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • tensor – [in] for threshold value

  • output – [out] The output vx_tensor.

vx_node mvxConnectedComponentNode(vx_graph graph, vx_image input, vx_int8 connectivity, vx_image output)

[Immediate] Immediate mode version of Connected Component

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • input – [in] The input vx_image.

  • connectivity – [in] The input connectivity.

  • output – [out] The output vx_image.

vx_status mvxuConnectedComponent(vx_context context, vx_image input, vx_int8 connectivity, vx_image output)

[Immediate] Immediate mode version of mvxuConnectedComponent

Parameters:
  • context – [in] The handle to the context in which to instantiate the node.

  • input – [in] The input vx_image.

  • connectivity – [in] The input connectivity.

  • output – [out] The output vx_image.

vx_node mvxTFLiteNInsNOutsNode(vx_graph graph, const vx_tensor inputs[], const vx_uint32 in_num, const vx_array in_scales, const vx_array in_zeroPoints, const vx_array tflite, const vx_tensor outputs[], const vx_uint32 out_num, const vx_array out_scales, const vx_array out_zeroPoints)

[Immediate] Immediate mode version of mvxTFLiteNInsNOutsNode

Parameters:
  • graph – [in] The handle to the graph in which to instantiate the node.

  • inputs – [in] The input vx_tensor array.

  • in_num – [in] The number of input vx_tensors.

  • in_scales – [in] The scales for the input tensors.

  • in_zeroPoints – [in] The zero points for the input tensors.

  • tflite – [in] A TFLite buffer stored in vx_array of VX_TYPE_CHAR

  • outputs – [out] The output vx_tensor array.

  • out_num – [out] The number of output vx_tensors.

  • out_scales – [out] The scales for the output tensors.

  • out_zeroPoints – [out] The zero points for the output tensors.

5.6. Quantization Tool API Documentation

5.6.1. Backward-Incompatible Changes

5.6.1.1. Version 1.5.0

Quantization Tool version 1.5.0 introduces some changes to the API and the default behavior.

  1. The mixed_precision_search_space option was removed from the ConfigGenerator class.

    To produce mixed-precision quantization configuration file, please use the new MixedPrecisionConfigGenerator class.

    1. Please use mtk_quantization.pytorch.MixedPrecisionConfigGenerator class instead of mtk_quantization.pytorch.ConfigGenerator class.

    2. Please use mtk_quantization.tfv1.MixedPrecisionConfigGenerator class instead of mtk_quantization.tfv1.ConfigGenerator class.

    3. Please use mtk_generate_tfv1_mixed_precision_quantization_config_file executable instead of mtk_generate_tfv1_quantization_config_file executable.

  2. By default, the example_input_data is now required when producing PyTorch quantization configuration file.

    This is because Quantization Tool now includes UnionQuantizer targets in the configuration file by default, which requires example input data to do model structure analysis. To disable this behavior, users can set the ignore_union_quantizer_targets option to True.

5.6.2. TensorFlow V1 Quantization-Aware Training Tool

5.6.2.1. Python API

mtk_quantization.tfv1.ConfigGenerator(...)

Class that generates the quantization configuration.

mtk_quantization.tfv1.QuantizeHandler()

Class for quantization-aware training on TensorFlow v1.

mtk_quantization.tfv1.estimator.Context()

Class used to access the objects created in the wrapped model_fn.

mtk_quantization.tfv1.estimator.prepare_model_fn(...)

Prepare the model_fn for quantization-aware training.

class mtk_quantization.tfv1.ConfigGenerator(graph_definput_namesinput_shapesoutput_names)

Class that generates the quantization configuration.

Parameters:
  • graph_def – A tf.GraphDef object. The TensorFlow model to analyze.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of positive int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

export_config(output_fileuse_converter_tool=Noneignore_invalid_batch_norms=None)

Export the quantization configuration.

Parameters:
  • output_file – A str value. The output configuration file name.

  • use_converter_tool – A bool value. Whether to leverage the converter tool to generate the quantization configuration. Defaults to True.

  • ignore_invalid_batch_norms – A bool value. Whether to disable the checks for invalid batch normalization operations, such as unfused batch normalization operations or the batch normalization with dynamic training/evaluation modes. If False, will raise an exception in case these invalid batch normalization operations are found. Defaults to False.

class methodfrom_frozen_graph_def(graph_definput_namesinput_shapesoutput_names)

Create the ConfigGenerator object from frozen GraphDef object.

Parameters:
  • graph_def – The tf.GraphDef object. The TensorFlow graph to be analyzed.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of positive int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

Returns:

A ConfigGenerator object.

class methodfrom_frozen_graph_def_file(graph_def_fileinput_namesinput_shapesoutput_names)

Create the ConfigGenerator object from frozen GraphDef object.

Parameters:
  • graph_def_file – A str value. The TensorFlow GraphDef model file to be analyzed.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored.

  • input_shapes – A list of list of positive int values. The input tensor shapes.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored.

Returns:

A ConfigGenerator object.

class methodfrom_saved_model_dir(saved_model_dirinput_names=Noneinput_shapes=Noneoutput_names=Nonetag_set=Nonesignature_key=None)

Create the ConfigGenerator object from a SavedModel.

Parameters:
  • saved_model_dir – A str value. Path to the SavedModel directory.

  • input_names – A list of str values. The input tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., use input tensor names from SignatureDef.

  • input_shapes – A list of list of positive int values. The input tensor shapes. Defaults to None, i.e., use input tensor shapes from SignatureDef.

  • output_names – A list of str values. The output tensor names. Note that the :0 tensor name postfix can be ignored. Defaults to None, i.e., use output tensor names from SignatureDef.

  • tag_set – Set of tags identifying the MetaGraphDef within the SavedModel to analyze. Defaults to None, i.e., set(tf.saved_model.tag_constants.SERVING).

  • signature_key – Key identifying SignatureDef containing inputs and outputs. Defaults to None, i.e., tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY

Returns:

A ConfigGenerator object.

get_available_options()

Get the available option names.

Returns:

A list of str values. The available option names to configure the generator.

propertyactivations_bitwidth

An int value. The quantization bitwidth for the activation quantizer targets. Defaults to 8.

 
propertyactivations_quantizer_type

str value. The quantizer type used for the activation quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘EMAQuantizer’.

 
propertyignore_union_quantizer_targets

bool value. Whether to ignore quantizer targets with UnionQuantizer in the configuration. Note that the UnionQuantizer targets will always be ignored (regardless of the value of this argument) when use_converter_tool argument is set to False. Defaults to False.

 
propertypower_of_two_quantization_scale_rounding_type

str value. The rounding function that is used to produce power-of-two quantization scale. Should be one of [‘RoundUp’‘RoundDown’‘RoundNearest’]. Defaults to ‘RoundUp’.

 
propertyuse_activations_symmetric_quantization

bool value. Whether to use symmetric quantization for the activation quantizer targets. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to use per-channel quantization for the constant weight quantizer targets. Defaults to True.

 
propertyuse_power_of_two_quantization_scale

bool value. Whether to use power-of-two quantization scale. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to use symmetric quantization for the constant weight quantizer targets. Defaults to True.

 
propertyweights_bitwidth

An int value. The quantization bitwidth for the constant weight quantizer targets. If not set, use the setting of activations_bitwidth. Defaults to None.

 
propertyweights_quantizer_type

str value. The quantizer type used for the constant weight quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘LastValueQuantizer’.

class mtk_quantization.tfv1.QuantizeHandler

Class for quantization-aware training on TensorFlow v1.

By default, variables created by the QuantizeHandler object belong to a separate collection (QuantizeHandler.VAR_COLLECTION), and they can only be controlled via its member functions (e.g., initsave, and restore). If the add_to_global_collection argument of the prepare function is set to True, those variables will be added to the global variable collection as well, and therefore can be controlled by the official convention (e.g., the tf.train.Saver class).

Parameters:

max_to_keep – An int value. The maximum number of recent checkpoints to keep when calling the save function. Defaults to 5.

before_finalize()

Creates operations for initialize and save/restore before the graph is finalized.

Note that this API is only required when the graph would be finalized (i.e., in tf.estimator high-level training APIs). Otherwise, those operations would be created on-the-fly by the time they are required.

disable_all_quantizers(sess)

Disable all the quantizers.

If the quantizers are disabled, the model output will no longer be affected. However, for quantizers that are not updated based on the gradient value, the quantizer-related variables (such as min/max value) will be updated as usual.

Parameters:

sess – A tf.Session object. The session used to disable the quantizers.

enable_all_quantizers(sess)

Enable all the quantizers.

If the quantizers are enabled, the model output will be affected (i.e., the quantization impact will be simulated).

Parameters:

sess – A tf.Session object. The session used to enable the quantizers.

freeze_all_batch_norms(sess)

Freeze all the FusedBatchNorm operations that are in training mode.

Parameters:

sess – A tf.Session object. The session used to freeze the batchnorm operations.

freeze_all_quantizers(sess)

Freeze all the quantizers.

This will prevent all the quantizers from updating minmax values. Note for quantizers that depend on the gradient values to update the minmax values, this function will only force the gradient values to become zero.

Parameters:

sess – A tf.Session object. The session used to freeze the quantizers.

init(sess)

Initialize the variables related to the quantization-aware training.

Parameters:

sess – A tf.Session object. The session used to initialize the variables.

prepare(is_trainingquant_config_filegraph=Nonetensors_to_update=Nonevariable_sharing_scope_mappings=Noneignore_invalid_batch_norms=Noneadd_to_global_collection=None)

Prepare and update the graph for quantization-aware training.

Parameters:
  • is_training – A bool value. Whether the graph is used for training.

  • quant_config_file – A str value. The quantization configuration filename.

  • graph – A tf.Graph object. The graph to process. If None, use the default graph. Defaults to None.

  • tensors_to_update – A list of tf.Tensor objects. The tensors to be updated. Defaults to [].

  • variable_sharing_scope_mappings – A dict object. The mapping from the name scope used in the configuration file to the actual name scope of the tensors in the graph. Note the variable sharing mechanism will be enabled when using this argument. That is, we will try to use the new name scope to find the quantizer targets (will fall-back to the original name scope if not exist), but use the original name scope to search for the variables created by the existing quantizers. An error occurs if such variable cannot be found in the current graph. Defaults to None, no mappings will be used.

  • ignore_invalid_batch_norms – A bool value. Whether to disable the checks for invalid batch normalization operations, such as unfused batch normalization operations or the batch normalization with dynamic training/evaluation modes. If False, will raise an exception in case these invalid batch normalization operations are found. Defaults to False.

  • add_to_global_collection – A bool value. Whether to add the quantization-aware training related variables to the global variable collection (tf.GraphKeys.GLOBAL_VARIABLES). Defaults to False.

Returns:

A list of tf.Tensor objects. The list of updated tensors corresponds to the tensors in the tensors_to_update argument. The length of the returned list would be the same as the length of tensors_to_update argument.

restore(sesssave_path)

Restore the variables related to the quantization-aware training.

Parameters:
  • sess – A tf.Session object. The session used to restore the variables.

  • save_path – A str value. The prefix of filenames of the checkpoint.

save(sesssave_pathglobal_step=None)

Save the variables related to the quantization-aware training.

Parameters:
  • sess – A tf.Session object. The session used to save the variables.

  • save_path – A str value. The prefix of filenames of the checkpoint.

  • global_step – The global step that would be passed to underlying TensorFlow saver. Defaults to None.

unfreeze_all_batch_norms(sess)

Unfreeze all the FusedBatchNorm operations that are in training mode.

Parameters:

sess – A tf.Session object. The session used to unfreeze the batchnorm operations.

unfreeze_all_quantizers(sess)

Unfreeze all the quantizers.

Parameters:

sess – A tf.Session object. The session used to unfreeze the quantizers.

propertyinit_op

Get the init operation for the variables related to the quantization-aware training.

class mtk_quantization.tfv1.estimator.Context

Class used to access the objects created in the wrapped model_fn.

Including the QuantizeHandler and EstimatorSpec.

Note that when accessing the properties of a Context object, it will return the objects created under the current default TensorFlow graph where the wrapped model_fn is called.

propertycall_count

The number of times the wrapped model_fn is called.

 
propertyeval_estimator_spec

The tf.estimator.EstimatorSpec returned by the wrapped model_fn under EVAL mode.

 
propertyeval_quantize_handler

The QuantizeHandler created in the wrapped model_fn under EVAL mode.

 
propertypredict_estimator_spec

The tf.estimator.EstimatorSpec returned by the wrapped model_fn under PREDICT mode.

 
propertypredict_quantize_handler

The QuantizeHandler created in the wrapped model_fn under PREDICT mode.

 
propertytrain_estimator_spec

The tf.estimator.EstimatorSpec returned by the wrapped model_fn under TRAIN mode.

 
propertytrain_quantize_handler

The QuantizeHandler created in the wrapped model_fn under TRAIN mode.

mtk_quantization.tfv1.estimator.prepare_model_fn(model_fnquant_config_fileignore_invalid_batch_norms=False)

Prepare the model_fn for quantization-aware training.

This function wraps the model_fn to create a mtk_quantization.tfv1.QuantizeHandler, then perform prepare() and before_finalize(). The tf.estimator.EstimatorSpec returned by the model_fn will be updated as well. Users can access the created QuantizeHandler via the returned mtk_quantization.tfv1.estimator.Context object.

Parameters:
  • model_fn – A callable. The model function to an tf.estimator.Estimator.

  • quant_config_file – A str value. The quantization configuration filename.

  • ignore_invalid_batch_norms – A bool value. Whether to disable the checks for invalid batch normalization operations, such as unfused batch normalization operations or the batch normalization with dynamic training/evaluation modes. If False, will raise an exception in case these invalid batch normalization operations are found. Defaults to False.

Returns:

A tuple of (mtk_quantization.tfv1.estimator.Contextcallable). The context and the wrapped model_fn ready for quantization-aware training.

5.6.2.2. Executable

5.6.2.2.1. mtk_generate_tfv1_quantization_config_file

 

Generate quantization configuration file from TensorFlow v1 model

 

usage: mtk_generate_tfv1_quantization_config_file [-h]
                                                  [--input_frozen_graph_def_file INPUT_FROZEN_GRAPH_DEF_FILE]
                                                  [--input_saved_model_dir INPUT_SAVED_MODEL_DIR]
                                                  --output_file OUTPUT_FILE
                                                  [--input_names INPUT_NAMES]
                                                  [--input_shapes INPUT_SHAPES]
                                                  [--output_names OUTPUT_NAMES]
                                                  [--tag_set TAG_SET]
                                                  [--signature_key SIGNATURE_KEY]
                                                  [--activations_bitwidth ACTIVATIONS_BITWIDTH]
                                                  [--weights_bitwidth WEIGHTS_BITWIDTH]
                                                  [--ignore_union_quantizer_targets IGNORE_UNION_QUANTIZER_TARGETS]
                                                  [--use_activations_symmetric_quantization USE_ACTIVATIONS_SYMMETRIC_QUANTIZATION]
                                                  [--use_weights_symmetric_quantization USE_WEIGHTS_SYMMETRIC_QUANTIZATION]
                                                  [--use_power_of_two_quantization_scale USE_POWER_OF_TWO_QUANTIZATION_SCALE]
                                                  [--power_of_two_quantization_scale_rounding_type {RoundUp,RoundDown,RoundNearest}]
                                                  [--use_per_output_channel_quantization USE_PER_OUTPUT_CHANNEL_QUANTIZATION]
                                                  [--weights_quantizer_type {AllValuesQuantizer,ConstantQuantizer,EMAQuantizer,LastValueQuantizer,LogTholdQuantizer}]
                                                  [--activations_quantizer_type {AllValuesQuantizer,ConstantQuantizer,EMAQuantizer,LastValueQuantizer,LogTholdQuantizer}]
                                                  [--use_converter_tool USE_CONVERTER_TOOL]
                                                  [--ignore_invalid_batch_norms IGNORE_INVALID_BATCH_NORMS]

5.6.2.2.1.1. Named Arguments

--input_frozen_graph_def_file

Path to the GraphDef file to be analyzed.

--input_saved_model_dir

Path to the SavedModel directory to be analyzed.

--output_file

Path to the output configuration file.

--input_names

Input tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--input_shapes

Input shapes (colon separated, and the dimensions are comma separated).

--output_names

Output tensor names (comma separated). Note that the :0 tensor name postfix can be ignored.

--tag_set

Set of tags (comma seperated) identifying the MetaGraphDef within the SavedModel to analyze. Take effect only when –input_saved_model_dir is set. Defaults to set(tf.saved_model.tag_constants.SERVING).

--signature_key

Key identifying SignatureDef containing inputs and outputs. Take effect only when –input_saved_model_dir is set. Defaults to tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.

--activations_bitwidth

The quantization bitwidth for the activation quantizer targets. Should be in [2, 16]. Defaults to 8.

--weights_bitwidth

The quantization bitwidth for the constant weight quantizer targets. Should be in [2, 16]. If not set, use the setting of –activations_bitwidth. Defaults to None.

--ignore_union_quantizer_targets

Whether to ignore quantizer targets with UnionQuantizer in the configuration. Note that UnionQuantizer targets will always be ignored (regardless of the value of this argument) when use_converter_tool argument is set to False. Should be True or False. Defaults to False.

--use_activations_symmetric_quantization

Whether to use symmetric quantization for the activation quantizer targets. Should be True or False. Defaults to False.

--use_weights_symmetric_quantization

Whether to use symmetric quantization for the constant weight quantizer targets. Should be True or False. Defaults to True.

--use_power_of_two_quantization_scale

Whether to use power-of-two quantization scale. Should be True or False. Defaults to False.

--power_of_two_quantization_scale_rounding_type

Possible choices: RoundUp, RoundDown, RoundNearest

The rounding function that is used to produce power-of-two quantization scale. Should be one of [‘RoundUp’‘RoundDown’‘RoundNearest’]. Defaults to ‘RoundUp’.

--use_per_output_channel_quantization

Whether to use per-channel quantization for the constant weight quantizer targets. Should be True or False. Defaults to True.

--weights_quantizer_type

Possible choices: AllValuesQuantizer, ConstantQuantizer, EMAQuantizer, LastValueQuantizer, LogTholdQuantizer

The quantizer type used for the constant weight quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’‘LogTholdQuantizer’]. Defaults to ‘LastValueQuantizer’.

--activations_quantizer_type

Possible choices: AllValuesQuantizer, ConstantQuantizer, EMAQuantizer, LastValueQuantizer, LogTholdQuantizer

The quantizer type used for the activation quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’‘LogTholdQuantizer’]. Defaults to ‘EMAQuantizer’.

--use_converter_tool

Whether to generate the configuration file based on the converter tool. If False, will use our pre-defined patterns to analyze the model. Should be True or False. Defaults to True.

--ignore_invalid_batch_norms

Whether to disable the checks for invalid batch normalization operations, such as unfused batch normalization operations or the batch normalization with dynamic training/evaluation modes. If False, will raise an exception in case these invalid batch normalization operations are found. Should be True or False. Defaults to False.

5.6.2.2.2. mtk_upgrade_tfv1_quantization_config_file

 

Upgrade the TensorFlow quantization configuration file to the latest version

 

usage: mtk_upgrade_tfv1_quantization_config_file [-h] input_file output_file

5.6.4.2.1.1. Positional Arguments

input_file

Path to the input configuration file

output_file

Path to the output configuration file

5.6.3. TensorFlow V2 Quantization-Aware Training Tool

5.6.3.1. Python API

mtk_quantization.tfv2.keras.ConfigGenerator(model)

Class that generates the quantization configuration.

mtk_quantization.tfv2.keras.QuantizeHandler()

Class for quantization-aware training on Tensorflow v2 Keras.

mtk_quantization.tfv2.keras.fuse_layers(model)

Search and fuse layers in the model for quantization-aware training.

mtk_quantization.tfv2.keras.quantize_scope(*args)

Scope that used to deserialize quantized keras models and layers.

class mtk_quantization.tfv2.keras.ConfigGenerator(model)

Class that generates the quantization configuration.

Parameters:

model – A tf.keras.Model object. The keras model to analyze.

export_config(output_file)

Export the quantization configuration.

Parameters:

output_file – A str value. The output configuration file name.

get_available_options()

Get the available option names.

Returns:

A list of str values. The available option names to configure the generator.

propertyactivations_bitwidth

An int value. The quantization bitwidth for the activation quantizer targets. Defaults to 8.

 
propertyactivations_quantizer_type

str value. The quantizer type used for the activation quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘EMAQuantizer’.

 
propertyignore_union_quantizer_targets

bool value. Whether to ignore UnionQuantizer in the configuration. Defaults to False.

 
propertypower_of_two_quantization_scale_rounding_type

str value. The rounding function that is used to produce power-of-two quantization scale. Should be one of [‘RoundUp’‘RoundDown’‘RoundNearest’]. Defaults to ‘RoundUp’.

 
propertyuse_activations_symmetric_quantization

bool value. Whether to use symmetric quantization for the activation quantizer targets. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to use per-channel quantization for the constant weight quantizer targets. Defaults to True.

 
propertyuse_power_of_two_quantization_scale

bool value. Whether to use power-of-two quantization scale. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to use symmetric quantization for the constant weight quantizer targets. Defaults to True.

 
propertyweights_bitwidth

An int value. The quantization bitwidth for the constant weight quantizer targets. If not set, use the setting of activations_bitwidth. Defaults to None.

 
propertyweights_quantizer_type

str value. The quantizer type used for the constant weight quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘LastValueQuantizer’.

class mtk_quantization.tfv2.keras.QuantizeHandler

Class for quantization-aware training on Tensorflow v2 Keras.

disable_all_quantizers()

Disable all the quantizers.

If the quantizers are disabled, the model output will no longer be affected. However, for quantizers that are not updated based on the gradient value, the quantizer-related variables (such as min/max value) will be updated as usual.

enable_all_quantizers()

Enable all the quantizers.

If the quantizers are enabled, the model output will be affected (i.e., the quantization impact will be simulated).

freeze_all_batch_norms()

Freeze all the BatchNormalization layers.

This will prevent all the BatchNormalization layers from updating the moving_mean and moving_variance values.

freeze_all_quantizers()

Freeze all the quantizers.

This will prevent all the quantizers from updating minmax values. Note for quantizers that depend on the gradient values to update the minmax values, this function will only force the gradient values to become zero.

prepare(modelquant_config_file)

Prepare and convert the input model for quantization-aware training.

Parameters:
  • model – A tf.keras.Model object. The keras model to analyze.

  • quant_config_file – A str value. The quantization configuration filename.

Returns:

The converted model for quantization-aware training.

unfreeze_all_batch_norms()

Unfreeze all the BatchNormalization layers.

unfreeze_all_quantizers()

Unfreeze all the quantizers.

mtk_quantization.tfv2.keras.fuse_layers(modellayer_pool=None)

Search and fuse layers in the model for quantization-aware training.

Parameters:
  • model – A tf.keras.Model object. The keras model to transform.

  • layer_pool – A set of names of layers to fuse. Layers whose names are not in layer_pool will not be fused. Defaults to None, i.e., all layers may be fused.

Returns:

The transformed keras model with fused layers.

mtk_quantization.tfv2.keras.quantize_scope(*args)

Scope that used to deserialize quantized keras models and layers.

Under quantize_scope, Keras methods such as tf.keras.models.load_model or tf.keras.models.model_from_config will be able to deserialize quantized models and layers contain custom objects created by the quantization tool.

Example:

tf.keras.models.save_model(quantized_model, filepath)

with quantize_scope():
    tf.keras.models.load_model(filepath)

# If the quantized model contains other custom objects created by users, pass them to
# quantize_scope in order to deserialize the model
with quantize_scope({'MyDense': MyDense}):
    tf.keras.models.load_model(filepath)

See tf.keras.utils.custom_object_scope for more detail.

Parameters:

*args – Dictionary or dictionaries of {name: object} pair.

Returns:

tf.keras.utils.CustomObjectScope object with quantize objects included.

5.6.3.2. Executable

5.6.3.2.1. mtk_upgrade_tfv2_quantization_config_file

 

Upgrade the TensorFlow v2 Keras quantization configuration file to the latest version

 

usage: mtk_upgrade_tfv2_quantization_config_file [-h] input_file output_file

5.6.4.2.1.1. Positional Arguments

input_file

Path to the input configuration file

output_file

Path to the output configuration file

5.6.4. PyTorch Quantization-Aware Training Tool

5.6.4.1. Python API

mtk_quantization.pytorch.ConfigGenerator(model)

Class that generates the quantization configuration.

mtk_quantization.pytorch.QuantizeHandler()

Class for quantization-aware training on PyTorch.

mtk_quantization.pytorch.fuse_modules(model, ...)

Search and fuse modules for quantization-aware training.

mtk_quantization.pytorch.functional.Add()

A wrapper module that is equivalent to torch.add(Tensor, Tensor).

mtk_quantization.pytorch.functional.Cat([dim])

A wrapper module that is equivalent to torch.cat.

mtk_quantization.pytorch.functional.Div()

A wrapper module that is equivalent to torch.div(Tensor, Tensor).

mtk_quantization.pytorch.functional.Mul()

A wrapper module that is equivalent to torch.mul(Tensor, Tensor).

mtk_quantization.pytorch.functional.Sub()

A wrapper module that is equivalent to torch.sub(Tensor, Tensor).

class mtk_quantization.pytorch.ConfigGenerator(model)

Class that generates the quantization configuration.

Parameters:

model – A torch.nn.Module object. The PyTorch model to analyze.

export_config(output_fileexample_inputs=None)

Export the quantization configuration.

Note that users should provide example input data when the mixed_precision_search_space option is set.

Parameters:
  • output_file – A str value. The output configuration file name.

  • example_inputs – A torch.Tensor object or a list of torch.Tensor objects. The example input data for the given model.

get_available_options()

Get the available option names.

Returns:

A list of str values. The available option names to configure the generator.

propertyactivations_bitwidth

An int value. The quantization bitwidth for the activation quantizer targets. Defaults to 8.

 
propertyactivations_quantizer_type

str value. The quantizer type used for the activation quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘EMAQuantizer’.

 
propertyignore_union_quantizer_targets

bool value. Whether to ignore quantizer targets with UnionQuantizer in the configuration. Defaults to False.

 
propertypower_of_two_quantization_scale_rounding_type

str value. The rounding function that is used to produce power-of-two quantization scale. Should be one of [‘RoundUp’‘RoundDown’‘RoundNearest’]. Defaults to ‘RoundUp’.

 
propertyuse_activations_symmetric_quantization

bool value. Whether to use symmetric quantization for the activation quantizer targets. Defaults to False.

 
propertyuse_per_output_channel_quantization

bool value. Whether to use per-channel quantization for the constant weight quantizer targets. Defaults to True.

 
propertyuse_power_of_two_quantization_scale

bool value. Whether to use power-of-two quantization scale. Defaults to False.

 
propertyuse_weights_symmetric_quantization

bool value. Whether to use symmetric quantization for the constant weight quantizer targets. Defaults to True.

 
propertyweights_bitwidth

An int value. The quantization bitwidth for the constant weight quantizer targets. If not set, use the setting of activations_bitwidth. Defaults to None.

 
propertyweights_quantizer_type

str value. The quantizer type used for the constant weight quantizer targets. Should be one of [‘AllValuesQuantizer’‘ConstantQuantizer’‘EMAQuantizer’‘LastValueQuantizer’]. Defaults to ‘LastValueQuantizer’.

class mtk_quantization.pytorch.QuantizeHandler

Class for quantization-aware training on PyTorch.

disable_all_quantizers()

Disable all the quantizers.

If the quantizers are disabled, the model output will no longer be affected. However, for quantizers that are not updated based on the gradient value, the quantizer-related variables (such as min/max value) will be updated as usual.

enable_all_quantizers()

Enable all the quantizers.

If the quantizers are enabled, the model output will be affected (i.e., the quantization impact will be simulated).

freeze_all_batch_norms()

Freeze all the BatchNorm modules.

freeze_all_quantizers()

Freeze all the quantizers.

This will prevent all the quantizers from updating minmax values. Note for quantizers that depend on the gradient values to update the minmax values, this function will only force the gradient values to become zero.

prepare(modelquant_config_filedefault_module_device=None)

Prepare and convert the input model for quantization-aware training.

Parameters:
  • model – A torch.nn.Module object. The input model to prepare.

  • quant_config_file – A str value. The quantization configuration filename.

  • default_module_device – A str value. The desired device name of all parameters and buffers in a module. This argument is only used when a module has no parameters and buffers. Defaults to ‘cpu’.

Returns:

The converted model for quantization-aware training.

unfreeze_all_batch_norms()

Unfreeze all the BatchNorm modules.

unfreeze_all_quantizers()

Unfreeze all the quantizers.

mtk_quantization.pytorch.fuse_modules(modelexample_inputsinplace=False)

Search and fuse modules for quantization-aware training.

The module fusion mechanism is based on the gradient graph produced from the example input data by the PyTorch Autograd engine. Therefore, we cannot support data-dependent control flow blocks.

Parameters:
  • model – A torch.nn.Module object. The input model containing modules to be fused.

  • example_inputs – A torch.Tensor object or a list of torch.Tensor objects. The example input data for the given model.

  • inplace – A bool value. Whether the fusion is done in-place. Defaults to False.

Returns:

The model with fused modules.

class mtk_quantization.pytorch.functional.Add

A wrapper module that is equivalent to torch.add(Tensor, Tensor).

forward(xy)

Run the module.

class mtk_quantization.pytorch.functional.Cat(dim=None)

A wrapper module that is equivalent to torch.cat.

Parameters:

dim – An int value. The dimension over which the tensors are concatenated. Defaults to 0.

forward(tensors)

Run the module.

Parameters:

tensors – A sequence of tensors to be concatenated.

classmtk_quantization.pytorch.functional.Div

A wrapper module that is equivalent to torch.div(Tensor, Tensor).

forward(xy)

Run the module.

classmtk_quantization.pytorch.functional.Mul

A wrapper module that is equivalent to torch.mul(Tensor, Tensor).

forward(xy)

Run the module.

classmtk_quantization.pytorch.functional.Sub

A wrapper module that is equivalent to torch.sub(Tensor, Tensor).

forward(xy)

Run the module.

5.6.4.2. Executable

5.6.4.2.1. mtk_upgrade_pytorch_quantization_config_file

 

Upgrade the PyTorch quantization configuration file to the latest version

 

usage: mtk_upgrade_torch_quantization_config_file [-h] input_file output_file
5.6.4.2.1.1. Positional Arguments
input_file

Path to the input configuration file

output_file

Path to the output configuration file

5.7. Tflite Shim API Reference

struct TFLiteCustomOpExt

Public Members

const char *op_name
 
const char *target_name
 
const char *vendor_name
 
void *(*init)(TfLiteContext *context, const char *buffer, size_t length)
 
void (*free)(TfLiteContext *context, void *buffer)
 
TfLiteStatus (*prepare)(TfLiteContext *context, TfLiteNode *node)
 
TfLiteStatus (*add_params)(void*, ANeuralNetworksModel*, std::vector<uint32_t>&, uint32_t&)
 
struct TfLiteIntArray

Public Members

int size
 
int data[]
 
struct TFLiteTensorExt

Public Members

TFLiteTensorType type
 
int dimsSize
 
int dims[TFLITE_TENSOR_MAX_DIMENSTIONS]
 
void *buffer
 
size_t bufferSize
 
file NeuroPilotTFLiteShim.h

#include <dlfcn.h>

#include <vector>

Defines

TFLITE_TENSOR_MAX_DIMENSTIONS
 
TFLITE_LOG_D(format, ...)
 
TFLITE_LOG_D(format, ...)
 
LOAD_TFLITE_FUNCTION(name)
 
EXECUTE_TFLITE_FUNCTION(...)
 
EXECUTE_TFLITE_FUNCTION_RETURN_INT(...)
 
EXECUTE_TFLITE_FUNCTION_RETURN_BOOL(...)
 
EXECUTE_TFLITE_FUNCTION_RETURN_POINTER(...)
 

Typedefs

typedef struct ANeuralNetworksTFLite ANeuralNetworksTFLite
 
typedef struct NeuronModel NeuronModel
 
typedef struct ANeuralNetworksTFLiteOptions ANeuralNetworksTFLiteOptions
 
typedef struct ANeuralNetworksTFLiteTensor ANeuralNetworksTFLiteTensor
 
typedef struct TfLiteContext TfLiteContext
 
typedef uint32_t TFLiteBufferType
 
typedef uint32_t TFLiteTensorType
 
typedef uint32_t InferenceType
 
typedef uint32_t AccelerationMode
 
typedef struct TfLiteNode TfLiteNode
 
typedef int (*ANeuroPilotTFLiteOptions_create_fn)(ANeuralNetworksTFLiteOptions **options)
 
typedef int (*ANeuroPilotTFLiteOptions_free_fn)(ANeuralNetworksTFLiteOptions *options)
 
typedef int (*ANeuroPilotTFLiteOptions_setLowLatency_fn)(ANeuralNetworksTFLiteOptions *options, bool enableLowLatency)
 
typedef int (*ANeuroPilotTFLiteOptions_setDeepFusion_fn)(ANeuralNetworksTFLiteOptions *options, bool enableDeepDusion)
 
typedef int (*ANeuroPilotTFLiteOptions_setBatchProcessing_fn)(ANeuralNetworksTFLiteOptions *options, bool enableBatchProcessing)
 
typedef int (*ANeuroPilotTFLiteOptions_setWarmupRuns_fn)(ANeuralNetworksTFLiteOptions *options, uint32_t warmupRuns)
 
typedef int (*ANeuroPilotTFLiteOptions_setBoostHint_fn)(ANeuralNetworksTFLiteOptions *options, uint8_t boostValue)
 
typedef int (*ANeuroPilotTFLiteOptions_setBoostDuration_fn)(ANeuralNetworksTFLiteOptions *options, uint32_t boostDuration)
 
typedef int (*ANeuroPilotTFLiteOptions_setUseAhwb_fn)(ANeuralNetworksTFLiteOptions *options, bool use_ahwb)
 
typedef int (*ANeuroPilotTFLiteOptions_setAllowExtremePerformance_fn)(ANeuralNetworksTFLiteOptions *options, bool allow, uint32_t duration)
 
typedef int (*ANeuroPilotTFLiteOptions_setAllowFp16PrecisionForFp32_fn)(ANeuralNetworksTFLiteOptions *options, bool allow)
 
typedef int (*ANeuroPilotTFLiteOptions_resizeInputTensor_fn)(ANeuralNetworksTFLiteOptions *options, int32_t index, const int *dims, int32_t dimsSize)
 
typedef int (*ANeuroPilotTFLiteOptions_setAccelerationMode_fn)(ANeuralNetworksTFLiteOptions *options, AccelerationMode mode)
 
typedef int (*ANeuroPilotTFLiteOptions_setEncryptionLevel_fn)(ANeuralNetworksTFLiteOptions *options, int encryption_level)
 
typedef int (*ANeuroPilotTFLiteOptions_setCacheDir_fn)(ANeuralNetworksTFLiteOptions *options, const char *cache_dir)
 
typedef int (*ANeuroPilotTFLiteOptions_setPreference_fn)(ANeuralNetworksTFLiteOptions *options, int execution_preference)
 
typedef int (*ANeuroPilotTFLiteOptions_setDisallowNnApiCpu_fn)(ANeuralNetworksTFLiteOptions *options, bool disallow_nnapi_cpu)
 
typedef int (*ANeuroPilotTFLiteOptions_setCacheableIonBuffer_fn)(ANeuralNetworksTFLiteOptions *options, bool cacheable_ion_buffer)
 
typedef int (*ANeuroPilotTFLiteOptions_setUseIon_fn)(ANeuralNetworksTFLiteOptions *options, bool use_ion)
 
typedef int (*ANeuroPilotTFLiteOptions_setNoSupportedOperationCheck_fn)(ANeuralNetworksTFLiteOptions *options, bool no_supported_operation_check)
 
typedef int (*ANeuroPilotTFLiteOptions_setAcceleratorName_fn)(ANeuralNetworksTFLiteOptions *options, const char *accelerator_name)
 
typedef int (*ANeuroPilotTFLiteOptions_setExecutionPriority_fn)(ANeuralNetworksTFLiteOptions *options, int execution_priority)
 
typedef int (*ANeuroPilotTFLiteOptions_setMaxCompilationTimeout_fn)(ANeuralNetworksTFLiteOptions *options, uint64_t max_compilation_timeout_duration_ns)
 
typedef int (*ANeuroPilotTFLiteOptions_setMaxNumberDelegatedPartitions_fn)(ANeuralNetworksTFLiteOptions *options, uint32_t max_number_delegated_partitions)
 
typedef int (*ANeuroPilotTFLiteOptions_setMaxExecutionTimeout_fn)(ANeuralNetworksTFLiteOptions *options, uint64_t max_execution_timeout_duration_ns)
 
typedef int (*ANeuroPilotTFLiteOptions_setMaxExecutionLoopTimeout_fn)(ANeuralNetworksTFLiteOptions *options, uint64_t max_execution_loop_timeout_duration_ns)
 
typedef int (*ANeuroPilotTFLite_setBufferHandle_fn)(ANeuralNetworksTFLite *tflite, void **memory_data, TFLiteBufferType btype, int index, bool cacheable, int buffer_size)
 
typedef int (*ANeuroPilotTFLiteOptions_setCompileOptionByString_fn)(ANeuralNetworksTFLiteOptions *options, const char *compileOptions)
 
typedef int (*ANeuroPilotTFLite_create_fn)(ANeuralNetworksTFLite **tflite, const char *modelPath)
 
typedef int (*ANeuroPilotTFLite_createAdv_fn)(ANeuralNetworksTFLite **tflite, const char *modelPath, ANeuralNetworksTFLiteOptions *options)
 
typedef int (*ANeuroPilotTFLite_createWithBuffer_fn)(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize)
 
typedef int (*ANeuroPilotTFLite_createNeuronModelWithBuffer_fn)(NeuronModel **neuron_model, const char *buffer, const size_t bufferSize, uint32_t *neuron_input_index, uint32_t *neuron_output_index, uint32_t *current_neuron_index)
 
typedef int (*ANeuroPilotTFLite_createAdvWithBuffer_fn)(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, ANeuralNetworksTFLiteOptions *options)
 
typedef int (*ANeuroPilotTFLite_createCustom_fn)(ANeuralNetworksTFLite **tflite, const char *modelPath, const std::vector<TFLiteCustomOpExt> &customOperations)
 
typedef int (*ANeuroPilotTFLite_createAdvCustom_fn)(ANeuralNetworksTFLite **tflite, const char *modelPath, const std::vector<TFLiteCustomOpExt> &customOperations, ANeuralNetworksTFLiteOptions *options)
 
typedef int (*ANeuroPilotTFLite_createCustomWithBuffer_fn)(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, const std::vector<TFLiteCustomOpExt> &customOperations)
 
typedef int (*ANeuroPilotTFLite_createAdvCustomWithBuffer_fn)(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, const std::vector<TFLiteCustomOpExt> &customOperations, ANeuralNetworksTFLiteOptions *options)
 
typedef int (*ANeuroPilotTFLite_getTensorCount_fn)(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int32_t *count)
 
typedef int (*ANeuroPilotTFLite_getTensorRank_fn)(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, int *rank)
 
typedef int (*ANeuroPilotTFLite_getTensorDimensions_fn)(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, int *dimensions)
 
typedef int (*ANeuroPilotTFLite_getTensorByteSize_fn)(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, size_t *size)
 
typedef int (*ANeuroPilotTFLite_getTensorType_fn)(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, TFLiteTensorType *ttype)
 
typedef int (*ANeuroPilotTFLite_setTensorBuffer_fn)(ANeuralNetworksTFLite *tflite, int index, char *data)
 
typedef int (*ANeuroPilotTFLite_setInputTensorData_fn)(ANeuralNetworksTFLite *tflite, int index, const void *data, size_t size)
 
typedef int (*ANeuroPilotTFLite_getOutputTensorData_fn)(ANeuralNetworksTFLite *tflite, int index, void *data, size_t size)
 
typedef int (*ANeuroPilotTFLite_getDequantizedOutputByIndex_fn)(ANeuralNetworksTFLite *tflite, void *buffer, size_t bufferByteSize, int tensorIndex)
 
typedef int (*ANeuroPilotTFLite_invoke_fn)(ANeuralNetworksTFLite *tflite)
 
typedef int (*ANeuroPilotTFLite_free_fn)(ANeuralNetworksTFLite *tflite)
 
typedef int (*ANeuroPilotTFLite_setAllowFp16PrecisionForFp32_fn)(ANeuralNetworksTFLite *tflite, bool allow)
 
typedef int (*ANeuroPilot_getInferencePreference_fn)(void)
 
typedef int (*ANeuroPilotTFLiteCustomOp_getIntAttribute_fn)(const char *buffer, size_t length, const char *attr, int32_t *outValue)
 
typedef int (*ANeuroPilotTFLiteCustomOp_getFloatAttribute_fn)(const char *buffer, size_t length, const char *attr, float *outValue)
 
typedef void *(*ANeuroPilotTFLiteCustomOp_getUserData_fn)(TfLiteNode *node)
 
typedef int (*ANeuroPilotTFLiteCustomOp_getInput_fn)(TfLiteContext *context, TfLiteNode *node, int index, TFLiteTensorExt *tfliteTensor)
 
typedef int (*ANeuroPilotTFLiteCustomOp_getOutput_fn)(TfLiteContext *context, TfLiteNode *node, int index, TFLiteTensorExt *tfliteTensor)
 
typedef int (*ANeuroPilotTFLiteCustomOp_resizeOutput_fn)(TfLiteContext *context, TfLiteNode *node, int index, TfLiteIntArray *new_size)
 
typedef TfLiteIntArray *(*ANeuroPilotTFLite_createIntArray_fn)(int size)
 
typedef int (*ANeuroPilotTFLite_freeIntArray_fn)(TfLiteIntArray *v)
 

Enums

enum ExecutionPreference

Values:

enumerator kUndefined
 
enumerator kLowPower
 
enumerator kFastSingleAnswer
 
enumerator kSustainedSpeed
 
enum TfLiteStatus

Values:

enumerator kTfLiteOk
 
enumerator kTfLiteError
 
enum NpTFLiteBufferType

Values:

enumerator TFLITE_BUFFER_TYPE_INPUT
 
enumerator TFLITE_BUFFER_TYPE_OUTPUT
 
enum NpTFLiteTensorType

Values:

enumerator TFLITE_TENSOR_TYPE_NONE
 
enumerator TFLITE_TENSOR_TYPE_FLOAT
 
enumerator TFLITE_TENSOR_TYPE_UINT8
 
enumerator TFLITE_TENSOR_TYPE_INT32
 
enumerator TFLITE_TENSOR_TYPE_INT64
 
enumerator TFLITE_TENSOR_TYPE_STRING
 
enumerator TFLITE_TENSOR_TYPE_BOOL
 
enumerator TFLITE_TENSOR_TYPE_INT16
 
enumerator TFLITE_TENSOR_TYPE_COMPLEX64
 
enumerator TFLITE_TENSOR_TYPE_INT8
 
enumerator TFLITE_TENSOR_TYPE_FLOAT16
 
enum NpInferenceType

Values:

enumerator NP_INFERENCE_TYPE_NONE
 
enumerator NP_INFERENCE_TYPE_QNAUT
 
enumerator NP_INFERENCE_TYPE_FLOAT
 
enum NpAccelerationMode

Values:

enumerator NP_ACCELERATION_CPU
 
enumerator NP_ACCELERATION_NNAPI
 
enumerator NP_ACCELERATION_NEURON
 

Functions

inline int32_t GetAndroidSdkVersion()
 
inline void *loadTFLiteLibrary(const char *name)
 
inline void *getTFLiteLibraryHandle()
 
inline void *loadTFLiteFunction(const char *name)
 
inline int ANeuralNetworksTFLiteOptions_create(ANeuralNetworksTFLiteOptions **options)

Create an ANeuralNetworksTFLiteOptions with default options.

ANeuroPilotTFLiteOptionWrapper_free should be called once the object is no longer needed.

Parameters:

options – The ANeuralNetworksTFLiteOptions to be created. Set to NULL if unsuccessful.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuralNetworksTFLiteOptions_setAllowFp16PrecisionForFp32(ANeuralNetworksTFLiteOptions *options, bool allow)

Specifies whether ANeuralNetworksTFLiteOptions is allowed to be calculated with range and/or precision as low as that of the IEEE 754 16-bit floating-point format. This function is only used with float model. A float model is calculated with FP16 precision by default.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • allow – True to allow FP16 precision if possible.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline TfLiteIntArray *ANeuroPilotTFLiteWrapper_createIntArray(int size)

Create a copy of an array passed as src. Developers are expected to free memory with ANeuroPilotTFLiteWrapper_freeIntArray.

Parameters:

size – The array size to be created.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_freeIntArray(TfLiteIntArray *v)

Free memory of array v.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuralNetworksTFLiteOptions_resizeInputTensor(ANeuralNetworksTFLiteOptions *options, int32_t index, const int *dims, int32_t dimsSize)

Change the dimensionality of a given input tensor.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • index – The index of the input tensor.

  • dims – List of the dimensions.

  • dimsSize – Number of the dimensions.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setAccelerationMode(ANeuralNetworksTFLiteOptions *options, AccelerationMode mode)

Set preferred acceleration mode.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • mode – Refer to NpAccelerationMode enum definition.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setCacheDir(ANeuralNetworksTFLiteOptions *options, const char *cache_dir)

Set compilation cache directory.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • user – define cache directory.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setPreference(ANeuralNetworksTFLiteOptions *options, int execution_preference)

Set Execution Preference.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • execution – preference refer to ExecutionPreference enum definition.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setDisallowNnApiCpu(ANeuralNetworksTFLiteOptions *options, bool disallow_nnapi_cpu)

Set Disallow NnApi Cpu.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • disallow – nnapi cpu.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setCacheableIonBuffer(ANeuralNetworksTFLiteOptions *options, bool cacheable_ion_buffer)

This API is deprecated Set cacheable ion buffer

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • cacheable – ion buffer

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setUseIon(ANeuralNetworksTFLiteOptions *options, bool use_ion)

Set use Ion

Available only in Neuron Delegate. Available in API level 30.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • use – Ion

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setNoSupportedOperationCheck(ANeuralNetworksTFLiteOptions *options, bool no_supported_operation_check)

This API is deprecated Set no supported operation check

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • no – supported operation check

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setAcceleratorName(ANeuralNetworksTFLiteOptions *options, const char *accelerator_name)

Set Accelerator Name.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • accelerator – name.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setExecutionPriority(ANeuralNetworksTFLiteOptions*options, int execution_priority)

Set Execution Priority.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • execution – prioriy refer to enum definition.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setMaxCompilationTimeout(ANeuralNetworksTFLiteOptions *options, uint64_t max_compilation_timeout_duration_ns)

Set Max Compilation Timeout in NNAPI acceleration mode.NpAccelerationMode.

Available only in NNAPI Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • max – compilation timeout.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setMaxNumberDelegatedPartitions(ANeuralNetworksTFLiteOptions *options, uint32_t max_number_delegated_partitions)

Set Max number delegated partition in NNAPI.

Available only in NNAPI Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • max – number delegates partitions.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setMaxExecutionTimeout(ANeuralNetworksTFLiteOptions *options, uint64_t max_execution_timeout_duration_ns)

Set Max Execution Timeout in NNAPI acceleration mode.NpAccelerationMode.

Available only in NNAPI Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • max – execution timeout.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setMaxExecutionLoopTimeout(ANeuralNetworksTFLiteOptions *options, uint64_t max_execution_loop_timeout_duration_ns)

Set Max Execution Loop Timeout in NNAPI acceleration mode.NpAccelerationMode.

Available only in NNAPI Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • max – execution loop timeout.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setEncryptionLevel(ANeuralNetworksTFLiteOptions *options, int encryption_level)

This API is deprecated Set encryption level.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • encryption – level refer to NpEncryptionLevel enum definition.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setLowLatency(ANeuralNetworksTFLiteOptions *options, bool enableLowLatency)

Set the model optimization hint in Neuron acceleration mode.NpAccelerationMode. Allow to maximize the bandwidth utilization for low latency.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • enableLowLatency – True to allow low latency if possible.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setDeepFusion(ANeuralNetworksTFLiteOptions *options, bool enableDeepFusion)

Set the model optimization hint in Neuron acceleration mode.NpAccelerationMode. Allows deep fusion optimization. This may increase the model initialzation time.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • enableDeepFusion – True to allow deep fusion if possible.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setCompileOptionByString(ANeuralNetworksTFLiteOptions *options, const char *compileOptions)

Set Neuron compile options.

Available since API level 31.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • compileOptions – The string of compile options.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setBatchProcessing(ANeuralNetworksTFLiteOptions *options, bool enableBatchProcessing)

Set the model optimization hint in Neuron acceleration mode.NpAccelerationMode. Allows batch optimization of models with an N dimension greater than 1. This may increase the model initialzation time.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • enableDeepFusion – True to allow deep fusion if possible.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setWarmupRuns(ANeuralNetworksTFLiteOptions *options, uint32_t warmupRuns)

Set the number of warm up runs to do after the ANeuroPilotTFLite instance is created. This may increase the model initialzation time.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • warmupRuns – The number of warmup runs.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setBoostHint(ANeuralNetworksTFLiteOptions *options, uint8_t boostValue)

Set the model execution boost hint in Neuron acceleration mode.NpAccelerationMode.

For the execution preference set as NEURON_PREFER_SUSTAINED_SPEED, the executing boost value would equal to the boost value hint. On the other hand, for the execution preference set as NEURON_PREFER_LOW_POWER, the executing boost value would not exceed the boost value hint to save power.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • boostValue – The hint for the device frequency, ranged between 0 (lowest) to 100 (highest).

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setBoostDuration(ANeuralNetworksTFLiteOptions *options, uint32_t boostDuration)

Set the model execution boost duration.

Available only in Neuron Delegate.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • boostDuration – Set boost duration in ms.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setUseAhwb(ANeuralNetworksTFLiteOptions *options, bool use_ahwb)

Set use AhardwareBuffer.

Available only in Neuron Delegate. Available since API level 31.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • set – True to use AhardwareBuffer.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_setAllowExtremePerformance(ANeuralNetworksTFLiteOptions *options, bool allow, uint32_t duration)

This API is deprecated Specifies whether to allow extreme performance acceleration of model execution in Neuron acceleration mode NpAccelerationMode + fast-single-answer ExecutionPreference by acquiring other system resources at the cost of increased power consumption.

This option is enabled by default and apply extreme performance for 2 seconds.

Parameters:
  • options – The ANeuralNetworksTFLiteOptions instance.

  • allow – True to apply extreme performance if possible.

  • duration – Apply extreme performance for the duration in milliseconds.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuralNetworksTFLiteOptions_free(ANeuralNetworksTFLiteOptions *options)

Delete a ANeuralNetworksTFLiteOptions object.

Destroys the object used by the run time to keep track of the memory. This will free the underlying actual memory if no other code has open handles to this memory.

Parameters:

options – The ANeuralNetworksTFLiteOptions object to be freed.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_makeTFLite(ANeuralNetworksTFLite **tflite, const char *modelPath)

Create an ANeuralNetworksTFLite with the TFlite model stored in a file.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • modelPath – The full path of the tflite model file.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeCustomTFLite(ANeuralNetworksTFLite **tflite, const char *modelPath, const std::vector<TFLiteCustomOpExt> &customOperations)

Create an ANeuralNetworksTFLite with the TFlite model stored in a file.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • modelPath – The full path of the tflite model file.

  • customOperations – Custom defined operation list.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeAdvTFLite(ANeuralNetworksTFLite **tflite, const char *modelPath, ANeuralNetworksTFLiteOptions *options)

Create an ANeuralNetworksTFLite with the TFlite model stored in a file.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • modelPath – The full path of the tflite model file.

  • option – Option of the ANeuralNetworksTFLite object.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeCustomTFLite(ANeuralNetworksTFLite **tflite, const char *modelPath, const std::vector<TFLiteCustomOpExt> &customOperations, ANeuralNetworksTFLiteOptions *options)

Create an ANeuralNetworksTFLite with the TFlite model stored in a file.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • modelPath – The full path of the tflite model file.

  • customOperations – Custom defined operation list.

  • setting – Setting of the ANeuralNetworksTFLite object.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeTFLiteWithBuffer(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize)

Create an ANeuralNetworksTFLite with the TFLite model stored in a data buffer pointer. The data buffer will be duplicated in ANeuralNetworksTFLite instance. Caller could free the input data buffer after calling this API.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • buffer – The pointer to the tflite model buffer.

  • bufferSize – The number of bytes of the tflite model buffer.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeNeuronModelWithBuffer(NeuronModel **neuron_model, const char *buffer, const size_t bufferSize, uint32_t *neuron_input_index, uint32_t *neuron_output_index, uint32_t *current_neuron_index)

Make pre-created neuron model add tflite ops and return new neuron model and neuron index.

This API is used in CV+NN use case.

Available since API level 31. Available only in Neuron Delegate.

Parameters:
  • neuron_model – Pre-created neuron model.

  • buffer – The pointer to the tflite model buffer.

  • bufferSize – The number of bytes of the tflite model buffer.

  • neuron_input_index – Neuron indexs of input tflite model.

  • neuron_output_index – Neuron indexs of output tflite model.

  • current_neuron_index – Return final neuron index after add tflite ops.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeCustomTFLiteWithBuffer(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, const std::vector<TFLiteCustomOpExt> &customOperations)

Create an ANeuralNetworksTFLite with the TFLite model stored in a data buffer pointer. The data buffer will be duplicated in ANeuralNetworksTFLite instance. Caller could free the input data buffer after calling this API.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • buffer – The pointer to the tflite model buffer.

  • bufferSize – The number of bytes of the tflite model buffer.

  • customOperations – Custom defined operation list.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeAdvTFLiteWithBuffer(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, ANeuralNetworksTFLiteOptions *options)

Create an ANeuralNetworksTFLite with the TFLite model stored in a data buffer pointer. The data buffer will be duplicated in ANeuralNetworksTFLite instance. Caller could free the input data buffer after calling this API.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • buffer – The pointer to the tflite model buffer.

  • bufferSize – The number of bytes of the tflite model buffer.

  • option – Option of the ANeuralNetworksTFLite object.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_makeCustomTFLiteWithBuffer(ANeuralNetworksTFLite **tflite, const char *buffer, size_t bufferSize, const std::vector<TFLiteCustomOpExt> &customOperations, ANeuralNetworksTFLiteOptions *options)

Create an ANeuralNetworksTFLite with the TFLite model stored in a data buffer pointer. The data buffer will be duplicated in ANeuralNetworksTFLite instance. Caller could free the input data buffer after calling this API.

This only creates the object. Computation is performed once ANeuroPilotTFLiteWrapper_invoke is invoked.

ANeuroPilotTFLiteWrapper_free should be called once the object is no longer needed.

Parameters:
  • tflite – The ANeuralNetworksTFLite to be created. Set to NULL if unsuccessful.

  • buffer – The pointer to the tflite model buffer.

  • bufferSize – The number of bytes of the tflite model buffer.

  • customOperations – Custom defined operation list.

  • setting – Setting of the ANeuralNetworksTFLite object.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the model can’t be parsed correctly.

inline int ANeuroPilotTFLiteWrapper_getDequantizedOutputByIndex(ANeuralNetworksTFLite *tflite, void *buffer, size_t bufferByteSize, int tensorIndex)

Store dequantized contents of the given output tensor to user-allocated buffer. This function is only used with quantized model.

Parameters:
  • tflite – The ANeuralNetworksTFLite to get dequantized data from the output tensor.

  • buffer – The pointer to the user-allocated buffer for storing dequantized contents.

  • bufferByteSize – Specifies the buffer size in bytes.

  • tensorIndex – Zero-based index of the output tensor.

Returns:

ANEURALNETWORKS_NO_ERROR if successful.

inline int ANeuroPilotTFLiteWrapper_invoke(ANeuralNetworksTFLite*tflite)

Invoke inference. (run the whole graph in dependency order).

Parameters:

tflite – The ANeuralNetworksTFLite to invoke inference.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported. ANEURALNETWORKS_OP_FAILED if the operation is failed.

inline int ANeuroPilotTFLiteWrapper_setBufferHandle(ANeuralNetworksTFLite *tflite, void **memory_data, TFLiteBufferType btype, int index, bool cacheable, int buffer_size)

Set input/ouput with Ahardwarebuffer and get buffer virtual address.

Available since API level 31.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • memory_data – Get Ahardwarebuffer virtual address.

  • btype – Input or output tensor.

  • index – Zero-based index of tensor.

  • cacheable – Decide cacheable/non-cacheable buffer.

  • buffer_size – Set buffer size.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_free(ANeuralNetworksTFLite *tflite)

Delete a ANeuralNetworksTFLite object.

Destroys the object used by the run time to keep track of the memory. This will free the underlying actual memory if no other code has open handles to this memory.

Parameters:

memory – The ANeuralNetworksTFLite object to be freed.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getTensorCount(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int32_t *count)

Get the number of input/output tensors associated with the model.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • count – the number of input/output tensors.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getTensorRank(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, int *rank)

Get the dimensional information of the input/output tensor with the given index.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • index – Zero-based index of tensor.

  • rank – The rank of the tensor.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getTensorDimensions(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, int *dimensions)

Get the dimensional information of the input/output tensor with the given index.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • index – Zero-based index of tensor.

  • dimensions – The dimension array to be filled. The size of the array must be exactly as large as the rank.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getTensorByteSize(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, size_t *size)

Get the size of the underlying data in bytes.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • index – Zero-based index of tensor.

  • size – The tensor’s size in bytes.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getTensorType(ANeuralNetworksTFLite *tflite, TFLiteBufferType btype, int index, TFLiteTensorType *ttype)

Get the data type information of the input/output tensor with the given index.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • index – Zero-based index of tensor.

  • ttpte – The tensor’s data type.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_setTensorBuffer(ANeuralNetworksTFLite *tflite, int index, char *data)

Get the data type information of the input/output tensor with the given index.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • btype – Input or output tensor.

  • tensorIndex – Zero-based index of tensor.

  • data – The buffer.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_setInputTensorData(ANeuralNetworksTFLite *tflite, int index, const void *data, size_t size)

Copies from the provided input buffer into the input tensor’s buffer.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the input/output tensor.

  • index – Zero-based index of the input tensor.

  • data – The input buffer.

  • size – The input buffer’s size in bytes.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotTFLiteWrapper_getOutputTensorData(ANeuralNetworksTFLite *tflite, int index, void *data, size_t size)

Copies to the provided output buffer from the output tensor’s buffer.

Parameters:
  • tflite – The ANeuralNetworksTFLite which holds the output tensor.

  • index – Zero-based index of the output tensor.

  • data – The output buffer.

  • size – The output buffer’s size in bytes.

Returns:

ANEURALNETWORKS_NO_ERROR if successful. ANEURALNETWORKS_BAD_STATE if NeuroPilot is not supported.

inline int ANeuroPilotWrapper_getInferencePreference(void)

Get inference preference of current platform.

Returns:

NP_INFERENCE_TYPE_NONE if NeuroPilot is not supported. NP_INFERENCE_TYPE_QNAUT if quantization inference is preferred. NP_INFERENCE_TYPE_FLOAT if float inference is preferred.

inline int ANeuroPilotTFLiteWrapper_getCustomOpIntAttribute(const char *buffer, size_t length, const char *attr, int32_t *outValue)
 
inline int ANeuroPilotTFLiteWrapper_getCustomOpFloatAttribute(const char *buffer, size_t length, const char *attr, float *outValue)
 
inline void *ANeuroPilotTFLiteWrapper_getCustomOpUserData(TfLiteNode *node)
 
inline int ANeuroPilotTFLiteWrapper_getCustomOpInput(TfLiteContext *context, TfLiteNode *node, int index, TFLiteTensorExt *tfliteTensor)
 
inline int ANeuroPilotTFLiteWrapper_getCustomOpOutput(TfLiteContext *context, TfLiteNode *node, int index, TFLiteTensorExt *tfliteTensor)
 
inline int ANeuroPilotTFLiteWrapper_resizeCustomOpOutput(TfLiteContext *context, TfLiteNode *node, int index, TfLiteIntArray *new_size)
 

Variables

static void *sTFLiteHandle