# Shape characteristic in ggplot2

Thanks for you help sir,but when i run the code for example n which you have sent passes an error like
Warning: Ignoring unknown parameters: width, height
Error in filter(data, manufacturer == "audi" & model == "a4") :
1: In data.matrix(data) : NAs introduced by coercion

Same for in my case too
Can you please suggest whats the problem and how can i correct it

oh try library(tidyverse) or library(dplyr) for the filter.

and yes it was my mistake, i used geom_jitter() first an replaced it with geom_point and did not tested it again... That's why the width and height aren't working!

``````ggplot(data, aes(x = hwy, y = cty)) +
geom_point(aes(shape = class)) +
geom_smooth(method = "lm") +
geom_point(data = filter(data, manufacturer == "audi" & model == "a4"),
colour = "red", size = 3)
``````

Can any one please help to generate a similar plot as above.Because the components as 1)shape characters,2) Diagonal parallel lines of 1log units above and below (1:1) 3) Linear regression for whole data set (eg.my data set have several groups if i use lm() then they produce linear line for all group)

It wold be helpful to provide a example for achieving the above case

As shown above:

1. When putting the group = shape into the geom_point it doesn't affect the fitted line.
2. Diagonal lines can be introduced with geom_abline(), here the offset depends on your scale.
``````ggplot(data = iris, aes(x = Sepal.Length,
# generate near 1:1 ratio
y = Petal.Length*(Sepal.Length-Petal.Length))) +
geom_point(aes(shape = Species)) +
geom_smooth(method = "lm") +
# diagonal line at 1:1
geom_abline(slope = 1, intercept = 0) +
#upper line
geom_abline(slope = 1, intercept = 1,
linetype = "dotted") +
# lower line
geom_abline(slope = 1, intercept = - 1,
linetype = "dashed") +
theme_bw()
``````

2 Likes

Thank you so sir i will try it out but one query as you said above ti highlight a specific compound use overlay of geom_point
i tried on the iris data set using
.....The above code +geom_point(iris = filter(iris,Species=="sentosa"),colour= "red")
Resulting all points tends to overlay with red points.why does it happen and how can i correct it Mr.Matthis

use: geom_point(data = filter(iris,Species=="setosa"),colour= "red")

1 Like

sir can you please explain why is the Petal.Length*(Sepal.Length - Petal.Length)) term sir couldn't able to understand it whats actually happening

This data is generated from Multiple liner regression and a data frame has been created to store the predicted values
For the convinces this what my data look like and to be and to be plotted with 1:1 line with observed vs Predicted

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

``````testdata1 = tibble::tribble(
~observed, ~predicted_values,                              ~category,                        ~list,
1.6,         1.7662534,             "Monoaromatichydrocarbon",                    "Benzene",
1.92,          2.106053,             "Monoaromatichydrocarbon",                    "Toluene",
2.51,         2.4269167,             "Monoaromatichydrocarbon",                   "p-Xylene",
2.35,         2.4461834,             "Monoaromatichydrocarbon",                   "o-Xylene",
2.19,         2.4504166,             "Monoaromatichydrocarbon",               "Ethylbenzene",
2.82,         2.7491294,             "Monoaromatichydrocarbon",     "1,3,5-trimethylbenzene",
2.8,          2.765026,             "Monoaromatichydrocarbon",     "1,2,3-trimethylbenzene",
3.12,         3.1288376,             "Monoaromatichydrocarbon", "1,2,4,5-tetramethylbenzene",
2.87,         2.7956433,             "Monoaromatichydrocarbon",            "n-propylbenzene",
3.39,         3.1341133,             "Monoaromatichydrocarbon",             "n-butylbenzene",
2.25,         2.2123077, "Monoaromatichalogenatedhydrocarbon",              "Chlorobenzene",
2.59,         2.6237682, "Monoaromatichalogenatedhydrocarbon",        "1,2-dichlorobenzene",
2.65,         2.6376784, "Monoaromatichalogenatedhydrocarbon",         "1,4-dichlorobezene",
2.47,         2.6665618, "Monoaromatichalogenatedhydrocarbon",        "1,3-dichlorobenzene",
3.22,         3.0837152, "Monoaromatichalogenatedhydrocarbon",     "1,2,3-trichlorobenzene",
3.25,         3.0698757, "Monoaromatichalogenatedhydrocarbon",     "1,2,4-trichlorobenzene",
3.84,         3.4756695, "Monoaromatichalogenatedhydrocarbon", "1,2,3,4-tetrachlorobenzene",
3.93,         3.4918422, "Monoaromatichalogenatedhydrocarbon", "1,2,4,5-tetrachlorobenzene"
)
#> # A tibble: 6 x 4
#>   observed predicted_values category                list
#>      <dbl>            <dbl> <chr>                   <chr>
#> 1     1.6              1.77 Monoaromatichydrocarbon Benzene
#> 2     1.92             2.11 Monoaromatichydrocarbon Toluene
#> 3     2.51             2.43 Monoaromatichydrocarbon p-Xylene
#> 4     2.35             2.45 Monoaromatichydrocarbon o-Xylene
#> 5     2.19             2.45 Monoaromatichydrocarbon Ethylbenzene
#> 6     2.82             2.75 Monoaromatichydrocarbon 1,3,5-trimethylbenzene
``````

The data is to be plotted with 1:1 diagonal line with linear regression..as the above graph as mentioned by Mr.Matthias.Please help me on this sir.

This can work as a starting point

``````library(tidyverse)

testdata1 = tibble::tribble(
~observed, ~predicted_values,                              ~category,                        ~list,
1.6,         1.7662534,             "Monoaromatichydrocarbon",                    "Benzene",
1.92,          2.106053,             "Monoaromatichydrocarbon",                    "Toluene",
2.51,         2.4269167,             "Monoaromatichydrocarbon",                   "p-Xylene",
2.35,         2.4461834,             "Monoaromatichydrocarbon",                   "o-Xylene",
2.19,         2.4504166,             "Monoaromatichydrocarbon",               "Ethylbenzene",
2.82,         2.7491294,             "Monoaromatichydrocarbon",     "1,3,5-trimethylbenzene",
2.8,          2.765026,             "Monoaromatichydrocarbon",     "1,2,3-trimethylbenzene",
3.12,         3.1288376,             "Monoaromatichydrocarbon", "1,2,4,5-tetramethylbenzene",
2.87,         2.7956433,             "Monoaromatichydrocarbon",            "n-propylbenzene",
3.39,         3.1341133,             "Monoaromatichydrocarbon",             "n-butylbenzene",
2.25,         2.2123077, "Monoaromatichalogenatedhydrocarbon",              "Chlorobenzene",
2.59,         2.6237682, "Monoaromatichalogenatedhydrocarbon",        "1,2-dichlorobenzene",
2.65,         2.6376784, "Monoaromatichalogenatedhydrocarbon",         "1,4-dichlorobezene",
2.47,         2.6665618, "Monoaromatichalogenatedhydrocarbon",        "1,3-dichlorobenzene",
3.22,         3.0837152, "Monoaromatichalogenatedhydrocarbon",     "1,2,3-trichlorobenzene",
3.25,         3.0698757, "Monoaromatichalogenatedhydrocarbon",     "1,2,4-trichlorobenzene",
3.84,         3.4756695, "Monoaromatichalogenatedhydrocarbon", "1,2,3,4-tetrachlorobenzene",
3.93,         3.4918422, "Monoaromatichalogenatedhydrocarbon", "1,2,4,5-tetrachlorobenzene"
)

testdata1 %>%
ggplot(aes(x = observed, y = predicted_values)) +
geom_point(aes(shape = category)) +
geom_abline(slope = 1, intercept = 0) +
geom_abline(slope = 1, intercept = 1,
linetype = "dotted") +
geom_abline(slope = 1, intercept = - 1,
linetype = "dashed") +
scale_x_continuous(limits = c(0, 4)) +
scale_y_continuous(limits = c(0, 4)) +
geom_smooth(method = "lm", color = "black") +
coord_equal()
#> `geom_smooth()` using formula 'y ~ x'
``````

ggplot(data = predict_new4,aes(x = observed, y = predicted_values)) +
geom_point(aes(shape = category)) + geom_smooth(method = "lm")+
geom_abline(slope = 1, intercept = 0) +
geom_abline(slope = 1, intercept = 1,linetype = "dotted") +
geom_abline(slope = 1, intercept = - 1,linetype = "dashed") +
scale_x_continuous(limits = c(0, 4)) +
scale_y_continuous(limits = c(0, 4)) +
geom_smooth(method = "lm", color = "black") +
coord_equal()

When i give the complete set of data these appear

Warning messages:
1: Removed 23 rows containing non-finite values (stat_smooth).
2: Removed 23 rows containing non-finite values (stat_smooth).
3: The shape palette can deal with a maximum of 6 discrete values because more
than 6 becomes difficult to discriminate; you have 7. Consider specifying
shapes manually if you must have them.
4: Removed 42 rows containing missing values (geom_point).

what should i do to eliminate this sir.... could able to understand for the 7th variable but why does it removed the data

Those are just warnings you get because you don't have an equal number of observations for al the categories and because you have too many categories for being individually represented by point shapes.

The only solution for this would be for you to rethink the way you are representing the data.

But i have all the data's equally in both predicted vs Observed....how can i give a specific shape to the 7th variable which it did not consider?
Now i get ,the reason why it removed is because of the 7th variable is not given a symbol and the receptive datas are been removed...so how can i give a specific symbol to the 7th variable sir?

There are 42 data points that do not have observations, either in one or in both of the conditions.

Yes but you are grouping by category and you don't have an equal number of observations within each category.

No sir i get it now the amount of data related to unspecified shape has been removed.Because the number of values removed is equal to that unspecified group.So i can able to give it when given a specific shape for that variable

the polar chemical group is not assigned and thus removed

so how can i give specific symbol for that to consider

No sir i get it now the amount of data related to unspecified shape has been removed.Because the number of values removed is equal to that unspecified group.So i can able to give it when given a specific shape for that variable

The polar chemicals group are not assigned because i allows max of 6 only
the only way i can bring back is by giving a specific symbol for that(ie.Polar chemicals)
is there any way to give a specific symbol for it?

Okay, well that's true. I wasn't aware of this, you are right you need to define the shapes when you have over 6 different conditions.
You can do this with scale_shape_manual(), e.g. add the line:
scale_shape_manual(values = c(0:6))

To have more control you can define the shapes you want to use, e.g.:
scale_shape_manual(values = c(0,1,2,3,4,5,6,8,10))

You can chose the numbers you want and look them up for example here:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.