Blog #1 About my PhD: Introduction to the Subject

My Journey during my PhD in Computer Vision

My name is Mohammed El Amine MOKHTARI, and I am a computer vision research assistant who is also pursuing a Ph.D. As a result, I decided to start this blog series to share with you some of the papers, datasets, and courses that I will be using during my PhD.

What is the Topic of my Ph.D.?

For the time being, we are still looking to fix a specific subject, because as you know, it is difficult to put the exact subject at the beginning of a PhD, and I wanted to be honest with you about this. Even though the project we are working on is large, we cannot simply use the project’s name as the PhD’s name.

What is the Project We are Working on?

So the project we’re working on is to build a high-tech drone. Where it is capable of navigating on its own. It can enter a building where there may be a fire or criminals, and this drone will analyze each room and detect anything out of the ordinary so that humans can intervene.

So the first tasks we’ll tackle are:

Program the drone to remain stable in the position specified by the pilot.
Program the drone to navigate from one location to another on its own.
Configure the drone to detect objects.
And program it to detect anomalies.

The Papers that I Found Intriguing

So, as with any PhD student, we begin by reviewing the state of the art by reading papers and possibly taking courses on a specific task that we will need to complete the project…
The same thing happened to me; I began looking for papers that had work that was related to what we were going to do. I discovered a large number of papers, but I will only share those that may be of interest to you.

#1 Deep Drone

‘Deep Drone’ is the title of the paper. In this paper, they use drones to detect objects. So they tried Yolo and Faster RCNN, but as we all know, in real-time object detection projects, we always need to choose Yolo because its execution time is faster than the others. They tried both Yolo and Faster RCNN for this. Faster RCNN is more accurate and produces better results, but it is a little slow, which may cause issues when used in real-time projects. It is slow because its architecture has two stages (I learned this while I was taking an Object Detection course that I will talk about later).

#2 Event Detection and Classification in Video Surveillance Sequences

The second paper is for event detection, as we will need this type of detection in the future. I’ve used something similar to what they’re doing in this project before, and I even wrote an article about it. True, the title is Event Detection and Classification, but in the paper, they only do classification, but the way they do classification can be called detection, but it is not object detection as we know it.

They are performing Bag of Visual Words, which is the same as the Bag of Words discussed in this article. The idea behind this Bag of Visual Words is to take key points from the images and apply them to your training. This means that we will not use neural networks in this type of classification. Because the key points extracted with the Bag of Visual Words are the features required for classification. In this case, we can perform the classification using only a simple machine learning algorithm such as SVM, Knn, etc. When using the BoVW (Bag of Visual Words), we must take the following steps:

Identify the features in the images (using the Sift algorithms). We can also refer to them as descriptors.
To include these features, we create a dictionary (key points).
The classification is then performed using a classification algorithm based on these key points.

SVM and K-means were the algorithms that they used for classification.

If you’re curious and want to learn more about the paper, click on this link.

Courses I Found Useful

I did not begin my PhD in the same way that I began my internship. I didn’t want to just grab some code from GitHub, make some changes, and then run it. Because I discovered that this is the worst way to complete a project, especially if you are a computer vision specialist. If I am a web developer and I need computer vision code, I will copy and paste it because I do not need to understand it. However, as a computer vision specialist, you must understand what you are doing and why you are selecting this code, and if there are errors in the code, you must understand where the error is coming from. As a result, I resolved not to repeat this error for my PhD. And I began learning everything from the ground up. So I’ll tell you about some of the courses I’ve already finished and some that I’m about to start.

Deep learning for object detection using Tensorflow 2: This is one of the best object detection courses I’ve taken, with the instructor Nour Islam explaining the theoretical concepts of object detection algorithms like Faster RCNN, YOLO, and SSD. Then he began coding a real-world problem in which the model must detect whether or not people are wearing masks, and he demonstrates how to train in both the local machine and the cloud, a highly recommended course. This course is distinct from Deep Learning for Computer Vision. The Complete Bootcamp course, which covers the three main computer vision tasks (Classification, Object Detection, and Segmentation).
Machine Learning: This course is well-known; it was created by Andrew Ng and is widely regarded as the best machine learning course ever. I began this course after completing some ML projects, and as I previously stated, when I began the projects, I had no idea about the fundamentals of machine learning, so when I encountered errors or overfitting… I had no idea how to solve these issues. And when I began this course, I began to create machine links between what I saw in the code and its mathematical equation. I recommend this course to anyone who is just getting started with machine learning, whether for computer vision or not.
Convolutional Neural Network: This course was also created by Andrew Ng, and it is a fantastic course for understanding how CNNs work and everything about the operations that will be performed in deep learning architecture. He also explained some of the well-known architectures such as AlexNet, ResNet, Inception, Yolo, U-Net, and so on. I also recommend that you take this course.

Conclusion

So that was the first blog post about my PhD, in which I talked a little bit about the subject then the papers that I found interesting, as well as the courses that are useful for any computer vision student or engineer. In subsequent blogs, I will share with you additional papers or courses that I have found to be good and useful.