My Tensorflow collection -2

initial accuracy(test): 0.27


Epoch 8/8
22/22 [==============================] - 2s 76ms/step - loss: 0.5557 - accuracy: 0.8054 - val_loss: 0.3926 - val_accuracy: 0.8958
Training history: see how validation accuracy starts at random (~25%) and reaches 90% after 8 epochs

Fine Tuning

As explained in part 1, the first layers in the convolution block learn basic image features (edges, lines ..) whereas the last layers learn features more specific to the problem at hand (4 different faces).

Unfreezing some of the last layers of the Convolutional blocks
base_model.trainable = Falsefor layer in base_model.layers[fine_tune_at:]:      layer.trainable = True
Epoch 15/15
22/22 [==============================] - 3s 114ms/step - loss: 0.0124 - accuracy: 0.9986 - val_loss: 0.0070 - val_accuracy: 1.0000
Fine tuning with a few more epoch brings us to 100% accuracy. Note that we could have stopped training at epochs number 13


Now that the model is trained, we can run some inference (prediction) . Let’s start with the x86 PC

Benchmark results. Windows 10 21H2.

Turn on the Lite

In the test above, we ran our model on a laptop. However, laptops (or servers) are not the only computing platforms out there. By far. Let’s think of mobile devices, or even very low cost, embedded, edge devices.

  • latency & privacy (on device inference, so no need for any cloud/internet connectivity)
  • reduced model size (reduced memory requirements)
  • and optimized power consumption (can use modest CPU)


The python application available on github trains a model using transfer leaning, and then converts it into TFlite, with and without quantization. -ptransfert -b  
Training images are organized in sub-folders. One folder per class.
when training is completed, models (as well as various training artifacts) are stored in the models directory

Benchmarking TFlite

benchmarking TFlite on x86 CPU
  • fp32 is the TFlite model without quantization (ie using 32bits floating point). The model size is significantly reduced (8Mb vs 16Mb), without impact on accuracy. Inference time is also improved
  • GPU uses 16bits floating point quantization. As expected, the model size is half what it is with fp32
  • TPU uses 8 bit integer quantization. Again, the model size is decreased. However, inference time is thru the roof. To be honest, I am not sure what is going on here, and I guess it has to do with the fact that x86 is not optimized to run int8 operations (after all, who in its right mind, would use 8 bit on a 64 bits CPU?). Let’s try the 8 bit model on the Edge TPU accelerator

Edge TPU

To execute on the Edge TPU, the model needs to be converted from the TFlite format to the Edge TPU format, using the edgetpu_compiler (which is available only on Linux).

(from linux)$ edgetpu_compiler -s -m13 TPU.tflite
output of edgetpu_compiler
Edge TPU connects via USB 3.0
Running the benchmark with the USB EDGE TPU accelerator plugged-in (Windows 10 21H2)


Metadata can be included in the TFlite file. Amongst other things, this allows an application using the model to dynamically retrieve model’s input and output format.

Netron allows to display TFlite metadata

Even Lighter

Tensorflow can run on this guy (ESP32, worth few €)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
pascal boudalier

pascal boudalier

Tinkering with Raspberry PI, ESP32, Solar, LifePo4, mppt, IoT, Zwave, energy harvesting, Python, MicroPython, Keras, Tensorflow, tflite, TPU. Ex Intel and HP