How can I reorganize a messy spreadsheet to do statistics in R?

Hi,

It feels terrific to publish my first post in the RStudio community.

The attached spreadsheet has a strange format. I would need to reorganize it to do statistical analyses.

This spreadsheet is the output of a Praat script which measures the duration of labeled segments in an audio file.

Each label (for example "cc1bsf12" highlighted in the screenshot) occurred a maximum of five times in the audio file.

Ideally, I thought I would have columns with each label as header, like "cc1bsf12"., with the five measures in the column. But I would appreciate hearing from you about whether this is a good idea, and how to do this.

How would you reorganize the data?

Thank you for your help.

Best,
Dan

For data processing and further calculations a long format would be more adequate, that would be three columns, file_id, segment and duration. For this you can use tidyr::pivot_longer()

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Dear Andrés,

Thank you so much! I would really appreciate receiving more help.

  1. Based on the screenshot I shared, should I add headers to the columns before using pivot_longer?

  2. Should the final state be a dataframe where only columns A, B, and C are there? With column pairs D-E, F-G, etc, being put under columns B-C?

  3. May I share my .csv as a reprex?

Thank you for you generous help.
Dan

Yes, you should

Yes, that's what I meant. It would be closer to a normalized structure.

Yes, you can share a link to it but be aware that some people here do not like downloading random files from the internet for security reasons so you might lower your chances of getting help when compared to providing sample data on a copy/paste friendly format as described in the reprex guide I linked for you before.

1 Like

I am attaching a screenshot displaying what the spreadsheet might look like after applying Andrés's suggestion.

I hope this post will help other users in the future.

Thank you again, Andrés.

Best,
Dan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.