Image analysis using supervised and unsupervised image classification

My task is to perform image analysis on the cover page of 593 CSR reports and present the findings in a decision-useful manner in R. Can anyone please tell me how should I proceed?

Hint: Basic image analysis and unsupervised and supervised classification.

These are the samples of CSR cover page images.

1: https://i.stack.imgur.com/OyIs7.jpg 2: https://i.stack.imgur.com/O7Eml.jpg

Here is the link to all 593 reports. https://onedrive.live.com/?authkey=!ADiYNfFR-9BiCag&id=906A15D602127D79!585533&cid=906A15D602127D79

What kind of decision? That will greatly change the type of analysis to run. Especially since "basic image analysis" may not give you very useful information.

Is it a school assignment?

Hey, Alexis Thanks for reply. Nope this is my uni project.
My main task is to help regulators and other stakeholders to know what significant message a firm project through its cover page and how has it evolved over the years within a sector or a size cluster or individual firms.

I was thinking of running text ocr on the cover pages of all pdfs and then apply sentiment analysis on the extracted text and also using a word cloud.

This is the whole task.

Cover page image data of the CSR reports in a particular industry In the Onedrive folder “CSR and Sustainability reports”, link under “Data” below, you will find an rds file “GRIexcel.rds” with data points on various firms across the globe and various industries filing their CSR or sustainability reports with the GRI. In the same Onedrive folder, you’ll find a subfolder “Pdfs” which contains some CSR 25000 reports across various sectors. The file names consist of the “Company name” and “Year” of the report separated by “_”. “Company name” is the “Name” variable and “Year” is the “Publication Year” variable from the the rds file that are
used as the naming convention for the data.
Since different sectors have different stakeholders’, firms usually cater to the needs of the stakeholders it perceives most important and that is how the CSR reporting is usually tailored. There is, however, debate on level of the importance of a particular stakeholder across the various
sectors. Nonetheless, it has been well established in past research that the picture on the cover page of these reports’ projects a significant message about the firm and its vision. The regulators are thus now interested to know what significant message a firm project through its cover page and how has it evolved over the years within a sector or a size cluster or individual firms.Your task here is to perform an image analysis on the cover page of these reports and present your findings in a decision useful manner.

Hint: start with basic image analysis and can move on to unsupervised and supervised
classification.

Data
CSR data (Text and Image data)

https://1drv.ms/u/s!Anl9EgLWFWqQooxqiRwHJJope8D9vA?e=TGalJ1

Important: The pdf files will require pre-processing in order to facilitate efficient text and image analysis.

The industry assigned to me is in this link.

https://onedrive.live.com/?authkey=!ADiYNfFR-9BiCag&id=906A15D602127D79!585525&cid=906A15D602127D79

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.