Using sqldf() in R to create a df that fulfills a count distinct condition?

I am new to SQL and new to R but I feel like this should work:

In R Studio I have a large set of data which contains many id's (egoid) and many dates for each id.

I am starting with one year of fitness tracker measurements, but not every id has measurement for every day. Some people quit. Some only wore their trackers half the time.

So I want to select only the id's which have measurements for 350 or more dates (out of a year with 366 days). OK, so first of all I want to return all columns for those ids that fit the criteria. I tried using * after SELECT and got an error. Then I tried listing all my column names but I am getting an error after the first comma after egoid. Here is the error and the syntax is below :

Error: unexpected symbol in "Count350 <- sqldf(SELECT egoid"

Count350 <- sqldf(SELECT egoid, date, steps, sedentaryminutes, lightlyactiveminutes, fairlyactiveminutes, veryactiveminutes, totalcal FROM original_dataset WHERE SELECT distinct(count(date >350)) ORDER BY egoid)

The result I want is for the dataset to be whittled down to about 40 id's and for each id there would be 350-366 rows (days) of data.

If anything jumps out as wrong please let me know. Thank you!

If you are equally new to R and SQL why complicate yourself using sqldf()? That approach is more useful if you already know SQL and you feel more comfortable doing data wrangling with SQL rather than R, an easier way will be to do it using dplyr but to give you specific advice it would be helpful if you could provide a minimal REPRoducible EXample (reprex) illustrating your issue.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.