My background is not in math or statistics but I had a bit of suspicion that non-linear PLS is not really discussed or looked into by the statistics community even after an extensive Google search.
My use case:
I have about 30-40 variables that potentially has an effect on a process outcome (response variable). I also know that time and a few other variables that are correlated to time that affect my response. Some of these correlations are not linear (and they should never be linear, for example cell growth), instead they can be described by logistic, sigmoidal or negative exponential functions of time. I could manually transform each of these but (1) this can be time-consuming, (2) prone to error and personal bias and (3) I only know the exact relationship in some, not all cases.
My understanding of PLS is that if you have a set of linearly-correlated variables, you can simplify the model to a handful of variables. I don't think my response is really dependent on 30 variables but I am having a huge problem talking to my statistician colleagues who don't seem to grasp the problem while my biologist colleagues think of PLS as a magic tool to figure out which variables are important for the response ('throw everything into PLS and you'll get the answer').
When I looked at a scatter plot matrix of the variables and calculate the linear correlations, the R2 values are quite low but some of the plots are so obvious to be non-linear. Hence, my questioning of my colleagues' approach. In fact, most of the multivariate analysis that we do either fail to identify the predictor variables or only explain about 40 % of the response variance.
In this particular case, I was able to manually figure out the real contributors to my response and make a linear model containing just a few variables (not a great model though). The dataset was also small with about 30 rows/data points - usually I am left with either splitting the dataset to 70% training and 30% test or do a k-fold validation. On one hand, you could say I was cherry-picking my variables. On the other hand, how should I do this in a more 'statistical' manner?
Sorry for a long reply, more of a rant 