Image Processing in Python

Deep learning is a widely used technique that is renowned for its high accuracy. It can be used in various fields such as regression/classification, image processing, and natural language processing. The downside of deep learning is that it requires a large amount of data and high computational power to tune the parameters. Thus, it may not be an effective method to solve simple problems or build models on a small data set. One way to take advantage of the power of deep learning on a small data set is using pre-trained models built by companies or research groups. VGG16, which is used in this post, was developed and trained by Oxford’s Visual Geometry Group (VGG) to classify images into 1000 categories. Besides being used for classification, VGG16 can also be used for different applications for image processing by changing the last layer of the model.

I have a 2-month-old daughter, so my wife and I had to prepare some clothes, accessories, and furniture before her arrival. My wife asked me to build a model to find onesies similar to the ones she wanted to buy. I can utilize the pre-trained deep learning model to help my wife find similar onesies.

To explain how a pre-trained deep learning model can be used for this situation, I collected total of 20 images from the internet; 10 short sleeve baby onesies, eight other baby clothes, one adult t-shirt, and one pair of baby shoes. I included other baby clothes to validate if this model can distinguish them, and the images of a t-shirt and shoes are included as outliers. This is a great example to explain how pre-trained models can be utilized for the small data set.


Keras is a neural network library written in Python that runs Tensorflow at backend. Codes below will guide you through the details on how to utilize VGG16 using Keras in Python.

from keras.applications.vgg16 import VGG16, preprocess_input
from keras.preprocessing import image
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import Image, display     

VGG16 requires the input image size to be 224 by 224. A function below pre-processes images and converts them into arrays.

def preprocess(file):
         img = image.load_img(file, target_size=(224, 224))
         feature = image.img_to_array(img)
         feature = np.expand_dims(feature, axis=0)
         feature = preprocess_input(feature)
         print('Error:', file)
     return feature[0]

imgs = [preprocess('C:/Users/pc-procogia/Desktop/ProBlogia/'+str(i)+'.jpg') for i in range(1,21)]
X_pics = np.array(imgs)     

The last layer of neural networks determines the output of the model. VGG16 is primarily built for classification, thus the last layer needs to be modified to extract the raw image features from data set. In this post, no parameters will be trained, but only pre-trained layers will be used.

def feature_extraction(images):
     base_model = VGG16(weights='imagenet', include_top=True, input_shape = (224,224,3))

     for layer in base_model.layers:
         layer.trainable = False

     base_model.layers.pop()          # this removes the last layer
     base_model.outputs = [base_model.layers[-1].output]
     base_model.summary( )

     pic_features = base_model.predict(images)
     return pic_features

pic_features = feature_extraction(X_pics)     


As you can see from the results above, fc1 in the red box indicates that total 4096 features were extracted from each image, and the number of trainable parameters is zero since none of them were trained. The number of non-trainable parameters is more than 100 million which means it will take a long time to tune the parameters with your local machine. Because not everyone has access to a high computing system, the pre-trained model can come in handy here.

Now all features are extracted from each baby clothing images. Cosine similarity is a good metric to find similar clothes because all images are in the form of vectors. The code below will calculate the cosine similarities and show the most similar clothes.

dists = cosine_similarity(pic_features)
dists = pd.DataFrame(dists)

def get_similar(dists):
     for item in range(len(dists)):
         L = [i[0]+1 for i in sorted(enumerate(dists[item]), key=lambda x:x[1], reverse=True)]
         print('=== Your Favorite ===', item)
         display(Image(filename=str(item+1)+'.jpg',width=200, height=200))
         print('--- Similar Clothes---')
         for i in L[1:6]:
             display(Image(filename=str(i+1)+'.jpg',width=200, height=200))
     return L_list     

If you used the image of a onesie as the input, the following five images are similar clothes that the deep learning model chose. The model successfully picked the short sleeve onesies for us.


In this case below, the search results includes the onesies but also an adult t-shirt. It seems like the color was one of the features to show the adult t-shirt in the result.


The least similar image to the short sleeve onesies were the baby shoes. The shape of the shoes are obviously different from onesies. The simple feature extraction from images was able to distinguish it even though the model was not specifically tuned for this case.


To increase model accuracy or to focus on certain features of clothes such as color, shape, or pattern, you can include those features in the model and train them on top of the pre-trained model. Then, you do not have to train 100 million features, but you can include some features that your local machine can handle to achieve ideal results.

Learning how to add additional features to the pre-trained model will be an interesting topic for the next post. I hope you enjoyed this post on deep learning and feel motivated to start your own projects using pre-trained models.