Sum function not working

I am trying to sum a column, but when i change the df$column to as.numeric() it gives a totally wrong answer. Not sure how to fix this so i can get an appropriate sum.

> sum(as.numeric(df.main$Column.6.x))
[1] 636
> as.data.frame(df.main)
     Column.1                          Column.4.y Column.6.x Column.6.y
1     1010477           Wedge/Segburg/Cobblestone          1          1
2     1010488  Reclining Sofa/Segburg/Cobblestone          1          1
20    1150271    Sofa Chaise Sleeper/Jarreau/Gray          2          4
21    1150371    Sofa Chaise Sleeper/Jarreau/Blue          1          3
66    1640287     Reclining Power Sofa/Geoffstown          1          5
72    1770147         2 Seat Reclining Power Sofa          2          3
73    1770177          Wedge/Brassville/Graystone          3          4
77    1770247         2 Seat Reclining Power Sofa          3          6
78    1770277          Wedge/Brassville/Chocolate          1          2
87    1830315      PWR REC Sofa with ADJ Headrest          1          2
88    1870213  PWR Recliner/ADJ Headrest/Carrarse          0          6
89    1870215      PWR REC Sofa with ADJ Headrest          0          4
90    1870218      PWR REC Loveseat/CON/ADJ HDRST          0          3
92    1900213           PWR Recliner/ADJ Headrest          1          2
93    1900215      PWR REC Sofa with ADJ Headrest          1          4
94    1900218      PWR REC Loveseat/CON/ADJ HDRST          1          3
95    1910313           PWR Recliner/ADJ Headrest          4          6
96    1910314       PWR REC Loveseat/ADJ Headrest          5          7
97    1910315      PWR REC Sofa with ADJ Headrest          7          9
98    2050113           PWR Recliner/ADJ Headrest          3          8
99    2050115      PWR REC Sofa with ADJ Headrest          2          7
100   2050118      PWR REC Loveseat/CON/ADJ HDRST          2          7
105   2150613  PWR Recliner/ADJ Headrest/Composer          1          3
137   3380287   Reclining Power Sofa/Boxberg/Bark          1          2
157   3790248     LAF Reclining Loveseat/Leonberg          3          6
158   3790249 RAF REC Loveseat w/Console/Leonberg          3          6
159   3810248 LAF Reclining Loveseat/Walgast/Gray          0          1
160   3810249  RAF REC Loveseat w/Console/Walgast          0          1
161   3840081 2 Seat Reclining Sofa/Austere/Brown          1          1
227   5110377             Wedge/Coahoma/Dark Gray          2          2
229   5160115      PWR REC Sofa with ADJ Headrest          3          7
232   5160215      PWR REC Sofa with ADJ Headrest          3          4
248   5620213     PWR Recliner/ADJ Headrest/Duvic          1          2
249   5620215      PWR REC Sofa with ADJ Headrest          1          3
250   5620218      PWR REC Loveseat/CON/ADJ HDRST          1          2
289   7170115      PWR REC Sofa with ADJ Headrest          2          5
309   7540718      PWR REC Loveseat/CON/ADJ HDRST          7         10
310   7620425       Rocker Recliner/Mitchiner/Fog          4          6
311   7620489          REC Sofa w/Drop Down Table          3          8
312   7620494          DBL Rec Loveseat w/Console          2          6
314   7690188       Reclining Sofa/Capehorn/Earth          1          1
317   7690288     Reclining Sofa/Capehorn/Granite          3         10
319   7770188   Reclining Sofa/Brayburn/Chocolate          3          3
322   7810215      PWR REC Sofa with ADJ Headrest          1          1
324   7930777            Wedge/Vacherie/Chocolate          2          5
325   7930788   Reclining Sofa/Vacherie/Chocolate          2         12
343   8210588        Reclining Sofa/Kempten/Black          2          2
346   8310588     Reclining Sofa/Chivington/Earth          9         13
348   8470188         Reclining Sofa/Kilzer/Black          0          0
377   9520188  Reclining Sofa/Linebacker/Espresso          0          0
381   9860488          Reclining Sofa/Tulen/Mocha          4          6
383   9860688           Reclining Sofa/Tulen/Gray          1          2
1351 U6090047         2 Seat Reclining Power Sofa          1          4
1354 U9820047 2 Seat Reclining Power Sofa/Damacio          1          4
1356 U9820081       2 Seat Reclining Sofa/Damacio          0          0
1421  1770281    2 Seat Reclining Sofa/Brassville          0          1
1423  1790177             Wedge/Pittsfield/Fossil          0          1
1451  3380288         Reclining Sofa/Boxberg/Bark          0          1
1464  3530077          Wedge/Hallettsville/Saddle          0          1
1472  3810148      LAF Reclining Loveseat/Walgast          0          1
1473  3810149  RAF REC Loveseat w/Console/Walgast          0          1
1538  6070287     Reclining Power Sofa/Persiphone          0          1
1545  6470215      PWR REC Sofa with ADJ Headrest          0         -1
1560  6830377                Wedge/Grattis/Saddle          0          1
1575  7350887       Reclining Power Sofa/Calamine          0          1
1593  7530513         PWR Rocker REC/ADJ Headrest          0          1
1610  8920188        Reclining Sofa/Slayton/Mocha          0          1
2328 U6090071      Oversized Wedge/McCaskill/Gray          0          3
2329 U6090081     2 Seat Reclining Sofa/McCaskill          0          2
2334 U7260187 Reclining Power Sofa/Palladum/Metal          0          2
2335 U7260188       Reclining Sofa/Palladum/Metal          0          2
2342 U9820077            Wedge/Damacio/Dark Brown          0          3

Hi @bnjmn! Thanks for providing your data with this example, but I think we need to see a little more context here. In particular, there're a couple of odd things here:

  • You're calling as.data.frame() in order to print df.main. Is your data in another format, like a matrix?
  • The sum I would expect to see based on the data printed here is 110, but the row numbers along the side aren't continuous: they go 1, 2, 20, 21, 66, … . Is it possible that the conversion to a data frame is accidentally dropping a bunch of rows?

It might be helpful if you ran str(df.main) and showed us the output of that—it would show us the actual format and dimensions of the data. Hopefully that'll clue us in to what's going on :slightly_smiling_face:

1 Like

The reason I think it is not continuous is because this data frame comes from merging 2 cleaned and subset data frames. Here is the str()

> str(df.main)
'data.frame':	72 obs. of  4 variables:
 $ Column.1  : Factor w/ 2367 levels "1010477","1010488",..: 1 2 20 21 66 72 73 77 78 87 ...
 $ Column.4.y: Factor w/ 2033 levels "","2 Seat Reclining Power Sofa",..: 2019 1519 1653 1652 1505 2 2008 2 2007 1213 ...
 $ Column.6.x: Factor w/ 26 levels "-1","-2","-4",..: 5 5 12 5 5 12 16 16 5 5 ...
 $ Column.6.y: Factor w/ 46 levels "-1","-2","-6",..: 6 6 32 26 36 26 32 40 16 16 ...

I also tried df(droplevels(df)) but it didn't seem to do much for me.

Ahh, I see. Okay, I think I see a problem here: Column.6.x and Column.6.y are both factors. When they were originally converted to factors, the possible values of those columns were assigned numeric levels (and, optionally, labels). When you use as.numeric() on the factor directly, it's those levels you're getting (and because the levels and the original values are both numeric, nothing obviously weird is happening).

This SO answer explains things more succinctly than I could, and offers a solution:

So try using sum(as.numeric(levels(Column.6.x))[Column.6.x]) instead. FWIW, factors are often more of a pain than they're worth, and although they definitely have uses, they can be conceptually tricky to work with :persevere:

1 Like

Not sure if this has worked as I receive a total of 110, but I think this puts me on the right path. Once I figure it out fully I will post it here. I think I may need to define my own function and work from there.

Fair enough! Is it possible for you to go back to the source datasets and catch those columns before they become factors? I think because the values and the levels are so similar, you really need to verify which you're actually summing up in order to rule this out :smile:

First off, thanks for pointing me in the right direction.

Second, the answer is to download the package: varhandle

#run
df <- unfactor(df)
1 Like

You can convert factors containing numbers to numbers by

as.numeric(as.character(df.main$Column.6.x))
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.