My (first) keras pre-trained vgg16 model leads to results that are inexplicable to me (beginner)

I developed a shinyapp for image-claffification using a keras vgg16 pre-trained model and text2vec distance calculation.

The idea is: you see a nice product outside the store and you wanna know if that particular product is available in the store. So you take a photo, upload it to the app and the app shows you the most similar products the store has to offer.

Much like in the i-love-ikea-app (see here). I tried to rebuild it with my own products. As the author most likely will not respond to my issue after 3 years, i brougth the topic here, hope its fine.

For my project i began with 10 product images. So first of all i extracted the features of these 10 images, whichs results in a 10 by 25088 matrix, using the vgg16 pre-trained keras model without top layers.

Whenever a new image is uploaded to the app, the same keras model is used.
Afterwards the distances between the new image (1 by 25088 matrix) and the 10 images (10 by 25088 matrix) are calculated.

So far so good. The curious thing, which brought me here is that when i upload an image of product X to the app, while the app is trained on that particular image of product X, the app says the similarity is only ~75%. It should be 100%. Should it not?

Why is this? is this due to some code issues, or due to some more fundamental mathimatical/deep learning background, which i did not considered cause iam a machine learning beginner?

The code is basicly the same as the one from Longhows App (see above). The same problem occurs in his App too. So i presume, the issues are due to some things i dont understand yet.

However, i appreciate every hint :innocent:

Hi @anon73295571,

I am not quite sure I understand your code (logic) flow - if you are using a pre-trained model, what do you mean by

... while the app is trained on that particular image of product X...

Do you mean "trained" on that image, or do you mean that the model is used to forward propagate the image through to the softmax layer - these are quite different things.

Your right, that makes no sense; when i use a pre-trained model it is very unlikely that it would have been trained on that specific image which i use for my project. So i only can assume that i mean the latter.

What makes me also curious is, when i calculate distances in RStudio, the distance is 1. When i deploy it to shinyapps the distance is .75

If the X image is one of the 10 images that you compare it to, then indeed, it's distance (against itself practically) should be 1, but without a reproducible example it's hard to argue why this is not the case.

I am also assuming that you are not re-training any of the layers - so you are using really frozen weights as they are in the pre-trained model? Because any re-training will introduce at least some randomness in the final model weights.

Also, when you say you compute the distance between a given image and 10 other images you mean 10 distances, right? And then one of them (the one against the image itself) should be 1? Just to make sure I understand correctly.

Yes, image X is one of the 10 images that i compare it with (as you said; its against itself practically)

code:

model = application_vgg16(include_top = F, weights = "imagenet")

img_path = "productX.jgp"

img = image_load(img_path, target_size = c(224,224))
img = image_to_array(img)
img = array_reshape(img, c(1, dim(img)))
img = imagenet_preprocess_input(img)

features <- model %>%
  predict(img)

M1 <- as(matrix(features, ncol = length(features)), Class = "dgCMatrix")

same procedure for the other 9 images as well as for the new images uploaded into the App. Then i calculate the similarities:

sim <- 1-text2vec::dist2(M1, M2, method = "cosine")

So i basically dont perform any re-training or anything like that (i wish i could thou)

Right!

Still not clear how M2 is calculated...

I am not familiar with the text2vec package - in your case the dimensions of M1 and M2 are not the same, not sure how this particular function handles that - is it really doing what you expect it to do? Not sure why you use cosine btw (see e.g. here)

I would expect that each image representation ends up as a vector and the representations of the 10 images end up as a matrix with 10 columns. Then the distances are simply the distance of the vector against each of the columns of the matrix.

For debugging purposes, you can try explicitly writing a loop where the X-features are simply compared to each column of your 10-image feature matrix - even just take something like sum(x-m[column,])- if they are the same for one of the columns then that simple metric should return 0 - if it doesn't then you need to look further into it.

M2 was a little bit misleading. Could have mentioned it M_new_image. Calculated the same way as M1. It is the matrix of the newly to the App uploaded image whose distance to the other images (matrices) is to be calculated.

Same goes for M1; little bit misleading aswell. In my previous post M1 is the matrix of just one image. In my project i go this way: extract the features of the 10 images, transform them into a matrix, and bind them together

# 10 x 25088 matrix
M1 <- rbind(M1, M2, M3, M4, M5, M6, M7, M8, M9, M10)

#f 1x 25088 matrix (new image)
M2 

When i calculate the distance between the 10 images and the new image, the output is as follows, where [2,] is the image against itself, resulting in one hundred percent similarity.

1-text2vec::dist2(M1, M2, method = "cosine")
WARN [2019-11-06 13:42:37] Sparsity will be lost - worth to calculate similarity instead of distance.
10 x 1 Matrix of class "dgeMatrix"
           [,1]
 [1,] 0.7930369
 [2,] 1.0000000
 [3,] 0.8383663
 [4,] 0.8161694
 [5,] 0.1866685
 [6,] 0.1716610
 [7,] 0.1950101
 [8,] 0.1831219
 [9,] 0.2744161
[10,] 0.2861535

When i deploy it to shinyapps, similarity goes down to ~.75 percent.

OK - that's quite weird ... can you verify if package versions are the same? Right now, nothing else comes to mind ... I don't use shinyapps, but I do use RStudio Connect and I haven't come across a similar behavior. In the end you might need to look into the actual output of predict(img) across the 2 locations - get some stats out like means, max, min or so to see whether there is something with the model itself or the predict method.

I actually solved it.
The 10 images i gave to the model to extract features from where .tif-images (all > 30mb). But when i loaded new images into the App they where formated in .jpg.

Never would have thought that this would make a difference as u specify the target size in image_load() to target_size = c(224, 224)

Nvm, thank you very much for your help!

Great - sometimes investigations bring us back to the "roots". Please remember to mark a solution, even your own:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.