I am new to SQL and new to R but I feel like this should work:
In R Studio I have a large set of data which contains many id's (egoid) and many dates for each id.
I am starting with one year of fitness tracker measurements, but not every id has measurement for every day. Some people quit. Some only wore their trackers half the time.
So I want to select only the id's which have measurements for 350 or more dates (out of a year with 366 days). OK, so first of all I want to return all columns for those ids that fit the criteria. I tried using * after SELECT and got an error. Then I tried listing all my column names but I am getting an error after the first comma after egoid. Here is the error and the syntax is below :
Error: unexpected symbol in "Count350 <- sqldf(SELECT egoid"
Count350 <- sqldf(SELECT egoid, date, steps, sedentaryminutes, lightlyactiveminutes, fairlyactiveminutes, veryactiveminutes, totalcal FROM original_dataset WHERE SELECT distinct(count(date >350)) ORDER BY egoid)
The result I want is for the dataset to be whittled down to about 40 id's and for each id there would be 350-366 rows (days) of data.
If anything jumps out as wrong please let me know. Thank you!