225 kinds of Birds Classification Using Deep Learning

Abstract

The aim of this project is to classify 225 kinds of birds using a deep learning model for image classification. So for this work, we applied machine learning techniques to do the image recognition and the classification. We first addressed a classification problem by developing a pipeline. The solution we proposed for classification tasks was to use better features that were extracted by passing an image through a pre-trained convolutional neural network. We used Inception network to do this. Using these features we obtained very high accuracies

We also made use of data augmentation techniques to generate more images from a small set of images by applying different transformations such as zooming and shearing. We obtained some very promising results in both classifications and object detection tasks.

Introduction

In some European countries, they have noticed the accidental death of an important number of rare bird species by wind turbines, which led to the creation of startups that propose solutions to this problem. one of the solutions is based on image acquisition and processing systems. This system includes cameras for the detection part and a classifier to differentiate the different races. Our project consists of building a system that can classify at least 225 kinds of birds. To do that, there are a lot of approaches in computer vision science. Each method has its advantages and disadvantages. We decided to show you some of the methods that you can use for these kinds of problems and at the end, we will tell you which method we decided to use for our case.

Methods and Approches

In this section, we will give you some methods that are used for image recognition, and after that we will tell you why we chose deep learning as a method for our project.

SVM

Support Vector Machines (SVM) have been recently developed in the framework of statistical learning theory. It works well on smaller data sets, it has a high accuracy of prediction. But it is not suitable for larger datasets, as the training time with SVMs can be long and less effective on datasets containing noise.

Boosting

This method is also a machine learning method. It is more flexible with the choice of cost functions, adaptable to the specificities of the problems studied. But it is a Non-explicit model, numerous parameters are lost (tree size, number of iterations, regularization parameter, …).

KNN

For the KNN (K Nearest Neighbor) this method consists of finding the most common features or we can say the neighbors and put them in one group. Its algorithm is simple and easy to implement. The algorithm is versatile. It can be used for classification, regression. But the algorithm becomes much slower as the number of learning examples increases. The choice of the distance calculation method, as well as the number of neighbors K, may not be obvious.

Deep Learning

Allows real-time optimization. Allows avoiding manual forecasting at each data modification. Saves time when setting up the network. New input variables can be introduced to improve forecasts. But the neural networks do not provide explanations concerning their results which therefore limits the analysis of the existing phenomena. There is not yet a way to define the optimal neural network architecture. And needs a lot of data for the training to give the best results.

Chosen Method

After that you saw the different methods that we can use to solve our problem, now we should take one of these methods and use it. This choice will depend on what kind of data we have, the number of data that we provide and what we want to do with the final model.

For the dataset, we have enough images for the training, validation, and test taken from the Kaggle database.
We need a fast model at the end so that we can deploy it in a desktop application or even in a mobile app.
We have a machine to do the training
We need a model with good accuracy.

So going from these conditions, we can say that the perfect way to do this job is deep learning.

Experiment

Now after showing you the methods that you can use to solve this problem, we fixed one way for our project which is deep learning. To do that, we have decided to use transfer learning instead of creating a model from scratch.

Transfer Learning

For that, we will use a pre-trained model called Inception version 3. Transfer learning is to take a model that has been already trained in a database, then we delete the output layer and replace it with some layers that we will create by ourselves. These layers that we will add are specific to the data that we have because as we said there is the first part that is already trained that we won’t touch. So we should train only the part that we added.

Data Augmentation

Data augmentation is a technique that has the purpose of creating new data from an already existing dataset. Many machine learning algorithms require a good amount of data to train a good classifier. To generate more data from existing data, what we do is apply different transformations to the pre-existing data. These transformations could be rotation, translation, skewing, zooming, shearing, cropping…etc. We can also have a combination of these transformations so we can easily imagine how powerful this technique is.

How To Use The Program And The Model

In this section, we will explain how you can use the program for your use. So we will start by the training part, what you need to change to have your own model if you want. Then we will show how you can use our model directly if you want to.

Training Part

So for the training, you can download the file using this link. So in this link, you will find the program ready to be executed, but there are always some changes that you should do. It depends on the Dataset that you are using and of course you should change the paths to your directory.

Dataset

You will see that in our program we used images with a size of 229×229 pixels and we configured the program to work with this size of images. So now if your images are in that same size then don’t worry about that you can use your images directly but instead of that you should change the size in the program. Here are the lines that you need to change in the training part:

Line 17:

  base_model = InceptionV3(weights="imagenet", include_top=False,       input_tensor=Input(shape=(229, 229, 3)))

The shape function that defines the dimensions of the input image, you can start by changing this to the right size.

Line 54, 60, 72:

  target_size=(229,229)

Here also you need to put the right size of your images.

Paths

So for the size, that’s it. Now we talk about the paths.

You will find in the last part of the program (line 163 and 164) there you should put the path of the training data and the path where you want to save the model.

Test Part

Now for the test, you can find the program using this link. Now it depends on the images that you are using for the test. If they are the same size as the training data so now you don’t need to change the size. But I think that you want to have a general program that works with any size of images. For that, you should do the resizing part so that if you don’t have test images with the right size, you don’t need to think about them because you will do the resizing already.

Then of course you should put the right path for the model either you are using our or yours. Here is the link to my model.

Now for the test if you choose an image to predict, you will find the function predict that does the job, all you need to do is to call the function and put the path of the image that you want to predict in the function parameter.

Application Part

After that you have tested to model with some images, maybe you want to share this project with your friends. To do that, I don’t think that sharing the model and a part of the program will be a good idea. Because you don’t know if this person knows how to execute the program or not. For that we decided to put the final model in a GUI built using Python also with Tkinter and this application will make the model user friendly because he won’t need to use the model laterally with the program but he will use a button to select the image and another one to predict its name. Which is a good idea for people who don’t know how to use Python and machine learning.

In this case, you can create your application or you can use ours. Here is the link to the final application that we used.

The application will look like this: