What is causing this variable error in neural network image classification?

Notes I have provided a reproducible example below:

#if prompted with updates after run line 5/6 type n in console and ignore 
#the model utilises tensorflow in the backend
# Install EBImage
> source("https://bioconductor.org/biocLite.R")
> biocLite("EBImage",suppressUpdates=TRUE)
> # Load Packages
> library(EBImage)
> library(keras)
> 
> # Read images these images are just bunch of planes and cars images from google
> setwd("C:/Users/Joel/Desktop/SJU/Semester 3/R programming")
> pics <- c('p1.jpg', 'p2.jpg', 'p3.jpg', 'p4.jpg', 'p5.jpg', 'p6.jpg',
>           'c1.jpg', 'c2.jpg', 'c3.jpg', 'c4.jpg', 'c5.jpg', 'c6.jpg')
> mypic <- list()
> for (i in 1:12) {mypic[[i]] <- readImage(pics[i])}
> 
> # Explore
> print(mypic[[1]])
> display(mypic[[8]])
> summary(mypic[[1]])
> hist(mypic[[2]])
> str(mypic)
> 
> # Resize
> for (i in 1:12) {mypic[[i]] <- resize(mypic[[i]],28,28)}
> str(mypic)
> #we want a single venctor of dim 2352(28*28*3) hence we reshape
> # Reshape
> for (i in 1:12) {mypic[[i]] <- array_reshape(mypic[[i]], c(28, 28,3))}
> str(mypic)
> # Row Bind
> trainx <- NULL
> for (i in 7:11) {trainx <- rbind(trainx, mypic[[i]])}
> str(trainx)
> testx <- rbind(mypic[[6]], mypic[[12]])
> #lets represent train by 0 and car by 1
> trainy <- c(0,0,0,0,0,1,1,1,1,1 )
> testy <- c(0,1)
> 
> # One Hot Encoding
> trainLabels <- to_categorical(trainy)
> testLabels <- to_categorical(testy)
> trainLabels #you can notice it creates 2 dummy variables, in col 1, "1" represents plane and in column 2 "1" represents car
> # Model
> model <- keras_model_sequential()
> model %>%
>          layer_dense(units = 256, activation = "relu", input_shape = c(2352)) %>%
>          layer_dense(units = 128, activation = 'relu') %>%
>          layer_dense(units = 2, activation = "softmax")
> summary(model)
> 
> # Compile
> model %>%
>          compile(loss = "binary_crossentropy",
>                  optimizer = optimizer_rmsprop(),
>                  metrics = c('accuracy'))
> 
> # Fit Model  
> history <- model %>%
>          fit(trainx,
>              trainLabels,
>              epochs = 30,
>              batch_size = 32,
>              validation_split = 0.2)

The error I receive after running the above line is

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  **ValueError: Input arrays should have the same number of samples as target arrays. Found 5 input samples and 10 target samples.**

Detailed traceback:

**  File "D:\Users\Joel\Anaconda3\lib\site-packages\keras\engine\training.py", line 952, in fit**
**    batch_size=batch_size)**
**  File "D:\Users\Joel\Anaconda3\lib\site-packages\keras\engine\training.py", line 804, in _standardize_user_data**
**    check_array_length_consistency(x, y, sample_weights)**
**  File "D:\Users\Joel\Anaconda3\lib\site-packages\keras\engine\training_utils.py", line 237, in check_array_length_consistency**
**    'and ' + str(list(set_y)[0]) + ' target samples.')**
**>**

This means that the length of trainy should be the same as the sample dimension (first dimension) of trainx, which probably isn't the case (looking at the above they seem to be 10 and 5, respectively).

But how can I alter the length? Can u help

Can you point out where exactly the issue lies in the code

Whenever you use setwd, the resulting code is not reproducible. If I try to run your code on my laptop, it will invariably fail because there is no folder C:/Users/Joel/Desktop/SJU/Semester 3/R programming in my file system (it will also fail because my laptop is a Mac and this is a Windows path, but that's beside the point). Have a look here for a longer explanation of why using setwd makes your code non-reproducible:

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

I think that the point that was being made was that it is very difficult to give you a real answer without the substrate the reproduce the issue.

Imagine if your car had a problem but the mechanic only had your description of the problem but was not allowed to drive or look at the car.

They would only be able to solve simple issues without all of the information.

5 Likes

Hi, Max,

that's exactly my point. Also, the setwd line is not the only one which should be modified: for example, without the input images, the code won't run

> pics <- c('p1.jpg', 'p2.jpg', 'p3.jpg', 'p4.jpg', 'p5.jpg', 'p6.jpg',
>           'c1.jpg', 'c2.jpg', 'c3.jpg', 'c4.jpg', 'c5.jpg', 'c6.jpg')

Finally (but this is a probably a matter of personal preferences) I'd rather not install packages from Bioconductor, unless I really have to:

> source("https://bioconductor.org/biocLite.R")

However, this is not as important as the other two issues: I could always run the code in a RStudio Cloud instance, if it was reproducible.

1 Like

It was not my intention to be rude or snarky with my response to you @Andrea. I will keep it in mind henceforth. Thanks for your help.

2 Likes

Thanks for the clarification :slight_smile: Now, what about slightly modifying your code, so that I can run it and try to help you? A classic issue with remotely debugging covnet code, is that it's hard to run the code without sample images. We can solve this in two ways: either you upload the images you used (there's a button to upload images on Discourse, though I'm not sure how many images, and of which size, you can upload). Or you can follow the procedure in this blog to instead use the Kaggle dogs vs cats dataset:

This is a bit more annoying because everyone who wants to check your code has to get a Kaggle account, but given the circumstances I don't have other ideas. Other users on this forum may want to chime in.

Finally, is it really necessary to use the EBImages package in your code? keras already has a nifty load_image() function (you can have a look at the help for this, or see alternative approaches in the blog post I linked). By eliminating an unnecessary dependency in your code, you make your sample code closer to a MRE (Minimal Reproducible Example). This helps both you and us focus on the very core of the issue which is blocking you, without unnecessary distractions.

PS just for clarity, JJ Allaire's blog (the one I linked above) is concerned with a more complicated problem than you current one. He wants to train the model (partially) from scratch, while you only want to try a pretrained model. Thus you don't need to go through all of it, even if it could still be useful for you because his Keras blog posts are excellent!

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.