Object Detection

Learn how to easily train an AI to detect the exact position, dimension, and class of all the objects present within an image or video.


Before following this guide on Object Detection, make sure you completed the steps in the Get Started guide.

1. Create a new image dataset

Go to the Datasets section of Theos and click the + button inside the new image dataset card, to add a new image dataset. Write a name for your new dataset and click confirm.

2. Drop images to upload

Collect example images that contain the classes of objects you want your AI to detect, and drop them into Theos. For almost all use cases, 100 images is a good starting point for an initial training and testing of your AI. You can always upload and label more images to keep increasing the accuracy and performance of your AI.

3. Upload your images

Click the Start upload button to upload your images to Theos.

4. Wait for your images to finish uploading

Please, don't close this browser tab until the upload finishes and make sure your computer does not go to sleep if you happen to upload a large number of images.

5. Add a new class of object

Go to the bottom of the page and click the New class button to add a new class of object you want your AI to detect.

6. Confirm the new class

Write the name of the class and pick its color, finally click the Confirm button to create it.

7. Finish adding all the classes you need

After adding all the classes you want your AI to detect we can see their label statistics, or what is known as your dataset's class balance.

8. Start labeling your dataset

Click the Start labeling button on the top right corner to start teaching your AI what you want it to detect.

9. Select the class to label

Click on the class you want to label or press the shortcut number in your keyboard to select it.

10. Create a label

Place your mouse at one of the corners of the object you want to label. Then click and drag your mouse to create a new bounding box, finally release your mouse when you have encapsulated the object in the tightest possible manner.

11. Fix your label

Labels must encapsulate its object in the most tight and precise way possible, meaning no room has to be left between the bounding box and the contours of the object. If you accidentally made the bounding box bigger or smaller than the object, keep the space key pressed to enter into the transform mode (or click the hand icon in the top left corner) and fix your label. If the object is partially occluded by another object, make your best guess and draw the bounding box up until where you think the whole object contour will likely be.

12. Finish labeling all the objects in the image

Perfectly create all the labels you would want your AI to detect in this example image. Make sure no object is kept unlabeled, as this will confuse your AI and it won't perform well in production. This is very important. A single unlabeled object can significantly impact the accuracy of your AI.

13. Submit your labels

After you finished labeling the whole image, press the E shortcut key to submit your labels or click the Submit button in the bottom left corner. If image happens to don't have any objects in it, press the Q shortcut key to skip it, or click the Skip button.

14. Finish labeling the whole dataset

Finish labeling all the images in your dataset. You can inspect your dataset statistics in the bottom of its overview page. You should always strive to have a balanced dataset, meaning that all your classes have roughly the same number of labels. The more labels a class has, the more examples your AI will have to correctly learn from, and if another class has significantly fewer labels, your AI might mislabel it as the class with more labels or don't recognize it at all. But don't worry if the intrinsic distribution of your data does not allow for this. For example, in this use case people will always have twice more eyes than mouthes, noses, and faces.

15. Create a new training session

You are now ready to train your AI. Go to the Train section of Theos and click the New training button. Write a name for it and click confirm to create your new training session.

16. Configure your training session

Each training session is composed of the neural network's algorithm version taken from the Library, your Dataset, and the Machine that will perform the neural network training. Inside each training session there are of a set of experiments. Each experiment is an attempt of training your AI on your dataset.

Choose the algorithm

Currently Theos supports all the versions of the YOLOv5 state-of-the-art Object Detector. The extra large version is the most accurate one, but also demands more computational power, and therefore will take more time to train and will take longer perception time when deployed. For blazing fast, real-time speeds, choose a smaller version.

Choose the dataset

Select the dataset you want your AI to learn from.

Choose the machine

If you have one of our professional plans, choose one of your always connected and ready to use Theos Cloud Machines that come with powerful NVIDIA GPUs for lightning fast training. Otherwise, click the + button inside the add machine card to connect your own on-premise GPU machine or click the Use google colab button to use one of google colab's free GPUs.

17. Set your experiment's training configuration

An Epoch is the act of your AI going through the entire dataset and attempting to predict all the labels you created during the labeling process. The first time this happens, your AI will likely fail to correctly predict the class, position and dimensions of almost all your labels. This is why we must let our AI make many attempts, so it will learn from its mistakes. It is common practice to set a few hundred epochs per experiment. For most cases 300 epochs will be fine for an initial training.

The Batch size is the number of images your AI will predict in parallel, the higher this number, the shorter time each epoch will take to complete, but also the more GPU memory it will require. For Theos Cloud Machines, that come with 16GB of GPU memory. If you happen to be in the free plan, you may need to test this value to don't overload your GPU memory. But don't worry, Theos will let you know about this and let you change it so you can restart your training experiment.

At the end of each completed epoch, the machine will upload to Theos a checkpoint of your AI's current knowledge, in what is called a Weights file. The weights are the representation of the strengths of all the neural connections in the brain of your AI. For each experiment, Theos saves the Last epoch's weights as well as the weights generated in the epoch where the Best performance was achieved (because your AI may reach maximum accuracy at, for example, epoch 185, but start to degrade its performance later due to Overfitting). Later, when you decide to deploy your AI, you will have to choose which weights you want your AI to use.

Finally, you can also set Initial weights if you want your AI to start with the knowledge of a previously trained AI, instead of starting from scratch. This will make it achieve good accuracy in fewer epochs if the previous knowledge is sufficiently transferable to your current dataset.

18. Start training

Click the Start training button to make your AI learn from your dataset examples.

19. Wait for the training experiment to finish

Now you are free for a while, you can go grab a cup of coffee or watch a movie, your AI started training and you just have to wait for it to finish.

20. Monitor training progress and metrics

If you want, you can check the training progress and metrics once in a while. New metric values will stream directly to your browser once per minute of training, so you can monitor your AI learning in real-time.

The main metric to watch is the fitness of your AI. This represents how good your AI is at predicting the class of your labels, as well as their position and dimensions. Its value goes from 0 to 1, and the higher is better. Generally, a good enough object detector requires a fitness of 0.5 or above. This is the value used to determine if a given weight file is the Best one of the experiment. You can safely ignore most other metrics for now, we will talk about them in a future neural network debugging guide.

21. Training has finished

Your AI has finished training. You can now review all the training metrics one more time before deploying your AI into production to test it and finally integrate it with your software.

22. Create a new deployment

Go to the Deploy section of Theos and click the New deployment button to deploy your AI into a highly scalable REST API. Write a name for your deployment and click confirm.

23. Configure your deployment

Choose the algorithm

Choose the algorithm version you used to train your AI.

Choose the weights

Choose which weights you want your AI to use.

24. Deploy your AI

Click the Finish button to deploy your AI to a highly scalable REST API. Your AI should be deployed within a few minutes.

25. Try out your AI inside the playground

Drag and drop an image to Theos and click the Detect button to try your AI.

26. Use your AI in your software

Start using your AI in your software by making simple HTTP post requests to your deployment's URL.

The request has 6 possible fields:

  • image (required): the binary data of an image or video frame.

  • conf_thres (optional): is the Minimum confidence value configurable in the Playground, possible values go from 0 to 1.

  • iou_thres (optional): is the Detection recall value configurable in the Playground, possible values go from 0 to 1.

  • ocr_model (optional): is the Text Recognition Model value configurable in the Playground, possible values are small, medium or large.

  • ocr_classes (optional): the class names on which to perform OCR on, they are comma separated. For example: license-plate, billboard, signature.

  • ocr_language (optional): if the ocr_model is small it is possible to set the target language for reading special characters of particular languages. If unspecified, the default language is English. See the language code list to find your language of choice. Example for reading German: "ocr_language":"deu".


For Linux and MacOS.

     -F "image=@image.jpg" \
     -F "conf_thres=0.25" \
     -F "iou_thres=0.45" \
     -X POST

For Windows.

     -F "image=@image.jpg" ^
     -F "conf_thres=0.25" ^
     -F "iou_thres=0.45" ^
     -X POST


We will use the requests package to make HTTP post requests.

pip install requests

Add the following code to your software to send an image to your AI and receive back its detections.

import requests
import json
import time

URL = '' # copy and paste your URL here
FALLBACK_URL = '' # copy and paste your fallback URL here
IMAGE_PATH = './image.jpg'

def detect(image_path, url=URL, conf_thres=0.25, iou_thres=0.45, ocr_model=None, ocr_classes=None, ocr_language=None, retries=10, delay=0):
    response = requests.post(url, data={'conf_thres':conf_thres, 'iou_thres':iou_thres, **({'ocr_model':ocr_model, 'ocr_classes':ocr_classes, 'ocr_language':ocr_language} if ocr_model is not None else {})}, files={'image':open(image_path, 'rb')})
    if response.status_code in [200, 500]:
        data = response.json()
        if 'error' in data:
            print('[!]', data['message'])
            return data
    elif response.status_code == 403:
        print('[!] you reached your monthly requests limit. Upgrade your plan to unlock unlimited requests.')
    elif retries > 0:
        if delay > 0:
        return detect(image_path, url=FALLBACK_URL if FALLBACK_URL else URL, retries=retries-1, delay=2)
    return []

detections = detect(IMAGE_PATH)

if len(detections) > 0:
    print(json.dumps(detections, indent=2))
    print('no objects found.')


We will use the axios package to make HTTP post requests.

npm install axios

Finally, create the following component and import it in your app to send an image to your AI and receive back its detections.

import React, { useState } from 'react';
import axios from 'axios';

const URL = ''; // copy and paste your URL here
const FALLBACK_URL = ''; // copy and paste your fallback URL here

function sleep(seconds) {
  return new Promise((resolve) => setTimeout(resolve, seconds * 1000));

async function detect({imageFile, url=URL, confThres=0.25, iouThres=0.45, ocrModel=undefined, ocrClasses=undefined, ocrLanguage=undefined, retries=10, delay=0}={}) {
  const data = new FormData();
  data.append('image', imageFile);
  data.append('conf_thres', confThres);
  data.append('iou_thres', iouThres);
  if(ocrModel !== undefined){
    data.append('ocr_model', ocrModel);  
  if(ocrClasses !== undefined){
    data.append('ocr_classes', ocrClasses);  
  if(ocrLanguage !== undefined){
    data.append('ocr_language', ocrLanguage);  
  try {
    const response = await axios({ method: 'post', url: url, data: data, headers:{'Content-Type':'multipart/form-data'}});
    return response.data;
  } catch (error) {
    if (error.response) {
      if(error.response.status === 0 || error.response.status === 413) throw new Error('image too large, please select an image smaller than 25MB.');
      else if(error.response.status === 403) throw new Error('you reached your monthly requests limit. Upgrade your plan to unlock unlimited requests.');
      else if(error.response.data) throw new Error(error.response.data.message);
    } else if (retries > 0) {
      if (delay > 0) await sleep(delay);
      return await detect(imageFile, url= FALLBACK_URL ? FALLBACK_URL : URL, confThres=0.25, iouThres=0.45, retries=retries-1, delay=2);
    } else {
      return [];

function TheosAPI() {
  const [detecting, setDetecting] = useState(false);
  const [detected, setDetected] = useState(false);
  const [detections, setDetections] = useState('');
  const [error, setError] = useState('');

  function onFileSelected(event) {
    const file = event.target.files[0];
      .then(detections => {
        setDetections(detections.length > 0? `${detections.length} OBJECTS FOUND\n${detections.map((detection, index) => ` ${'_'.repeat(30)}\n|\n| ${index+1}. ${detection.class}\n|\n|${'‾'.repeat(30)}\n|  ‣ confidence: ${detection.confidence*100}%\n|  ‣ x: ${detection.x}\n|  ‣ y: ${detection.y}\n|  ‣ width: ${detection.width}\n|  ‣ height: ${detection.height}\n|${'text' in detection? '  ‣ text: ' + detection.text:''}\n ${'‾'.repeat(30)}\n`).join('')}`: 'No objects found.');
      .catch(error => {

  return (
    <div style={{ padding: '20px' }}>
      <h1>Theos API</h1>
      {detecting ? <h3>Detecting...</h3> : <div><label htmlFor='file-upload' style={{cursor:'pointer', display:'inline-block', padding:'8px 12px', borderRadius: '5px', border:'1px solid #ccc'}}>Click to select an image</label><input id='file-upload' type='file' accept='image/*' onChange={onFileSelected} style={{display:'none'}}/></div>}
      {detected && <h3><pre>{detections}</pre></h3>}
      {error && <h3 style={{color:'red'}}>{error}</h3>}

export default TheosAPI;

27. Improve your AI

Now you should continue to add more examples to your dataset and retrain your AI to improve its accuracy. After you deployed your AI, you can use our magical Autolabeler to label new images 100 times faster. Let your AI help you create a better version of itself.

Last updated