Introduction to Computer Vision

5 min readNov 17, 2021

Computer vision is a field of AI that focuses on giving computers the ability to see and interpret the world around them in the same way that humans do. Computer vision involves teaching computers to observe the physical world, analyze data, and extract insights from visual inputs.

Computer vision is one of the most promising areas of research in artificial intelligence and computer science, and it offers great benefits to businesses today.

Image Processing vs Computer Vision

Basically, image processing involves altering one image in order to produce a new image with improved characteristics. The image might be resized, the brightness and contrast adjusted, the image cropped, blurred, or any number of other digital transformations performed. Digital image processing has no regard for the content of the image. On the other hand, modern computer vision means enabling computers to process visual data and extract insights from that data. The content is crucial, the aim is to teach computers how to recognize, classify, and categorize visual information.

How Computer Vision works

The basis of computer vision, in a sense, is pattern recognition. We can train computers to understand visual data by feeding them massive amounts of visual data, perhaps thousands or millions of images, if possible, that have been labeled. By utilizing various software algorithms and techniques, the computer analyses these images and finds patterns in them.

As an example, if we feed the computer a million pictures of dogs and cats, the computer with the help of certain algorithms will analyze the borders, shapes, distances between them, colors of each part, etc., and can eventually learn how a dog and cat look like. The computer will then use the information it has learned to construct a profile of a dog and cat that can be considered as a model. In the future, computers will be able to accurately detect whether an image is of a dog or of a cat when it is fed a new unseen image.

How does a computer sees an Image

The computer sees an image as a series of pixels, which are transformed into an array. The image below illustrates this process.

Computers see images as matrices. A grayscale image has one channel (gray). A grayscale image can be represented as a 2D matrix, where each element represents the brightness in that particular pixel. Color images have three channels RGB (red, green, blue). Color images can be represented as a 3D matrix whose depth is 3.

Each pixel is represented by a number between 0 and 255 (RGB color code). An image size of 12 x 16 results in an array of 12 x 16 integer values. If the image was in full color, each pixel would have three values. Thus, the output array would be 12 x 16 x 3.

How do Algorithms Analyze Images?

The algorithms that analyze images do not look at the picture as a whole like humans do. Rather, they focus on individual pixels, which are the smallest addressable elements in an image. In essence, an algorithm analyzes and learns these pixel values. This means a computer perceives and recognizes images based on numerical values. Also, algorithms are able to identify patterns in images by analyzing their numerical values and comparing them.

Conventional vs Deep Learning based Computer Vision

Rule Based

As early computer vision techniques were built on rules-based techniques to detect and classify certain groups of pixels, manual effort was required to identify and classify them. Humans manually selected which features they believed were relevant to individual objects. The machine was explicitly instructed that “cats have legs, and legs have thighs and paws, and paws have toes.Each of these features was codified into rigid rules that a computer could detect in an image.

Machine Learning

In machine learning, a set of features must be defined before learning can take place. There are a set of algorithms that can recognize patterns in images (e.g. edge detection algorithms). After features are defined to classify images and detect objects, Machine Learning uses statistical learning algorithms such as linear regression, logistic regression, decision trees or support vector machines (SVM).

Deep Learning

Deep learning does not require specific features to be defined. Deep learning represents a more effective way to do computer vision — it uses a specific algorithm called a neural network. If you feed a neural network many examples of a particular kind of data, it will be able to discover patterns between the examples and transform that information into a mathematical equation that will help it classify future data. The common approaches that use deep learning for computer vision tasks are based on convolutional neural networks.

Convolutional Neural Network

The convolutional neural network (CNN or ConvNet) is a deep learning algorithm that extracts features from images. Filters (kernels) are the key to the success of CNN in computer vision. Filters extract interesting features from an input image.

Manually designing filters to extract features for image classification and detection is not possible. The main purpose of CNN is to learn from the dataset the feature representations (filters).

CNNs treat images like matrices and extract spatial information from them, such as edges, depth, and texture. In a CNN, each layer extracts unique information from the image pixels. In the first layer, basic characteristics such as horizontal and vertical edges are detected. As you progress deeper in the neural network, the layers start detecting more complex features, such as shapes and corners. The final layers of a convolutional neural network can recognize specific features, such as faces, buildings, and places.

Source: https://developers.google.com/machine-learning/practica/image-classification/images/cnn_architecture.svg

Common Computer Vision Tasks

Image classification
Object Detection
Instance Segmentation
Semantic Segmentation
Face Recognition

Source: https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Computer vision applications include fields like: facial recognition technology, medical image analysis, self-driving cars, and intelligent video analytics and more.