University of Gothenburg

Powering Large-Scale ESG Research at the University of Gothenburg

Academic Research Europe
2.7M+
Text Chunks Analyzed
>90%
Classification Accuracy
Custom
Data Engineering

The Challenge Researchers at the University of Gothenburg’s School of Business, Economics and Law were conducting a massive study on environmental disclosures across the European market. The scope was immense: analyzing annual reports from every listed company in Europe to detect specific ESG patterns. However, they hit a "Noise Wall." Raw regulatory feeds are often mislabeled—an "Annual Report" might actually be a 2-page press release or a 500-page sustainability addendum. The research team was spending weeks developing complex heuristics (based on file size bins and page counts) just to identify the correct source documents, delaying the actual NLP analysis.

The Solution FinancialReports provided a bespoke Bulk Data Export combined with an iterative data engineering partnership.

  1. Precision Classification: We worked directly with the research team to refine our document classification engine. By combining our metadata with their specific requirements, we achieved a classification accuracy of over 90%, distinguishing true Annual Reports from auxiliary filings.
  2. Markdown at Scale: Instead of forcing the university to OCR terabytes of PDFs, we delivered the dataset in clean, structured Markdown. This allowed their GPU clusters to ingest text immediately without preprocessing overhead.
  3. Metadata Enrichment: We enriched the dataset with verified primary ISIN codes, enabling the researchers to link textual findings with financial performance data seamlessly.

The Result The university successfully processed over 2.7 million text chunks, isolating accurate environmental signals in complex sectors like Mining, Oil, and Gas. By offloading the document classification and parsing to FinancialReports, the research team shifted their focus from data cleaning to training their language models, significantly accelerating their publication timeline.

"The quality of the Markdown conversion is excellent. It allowed us to bypass the PDF parsing nightmare and run our models directly on millions of text chunks. FinancialReports didn't just hand us a dataset; they worked with us to refine the classification logic until it met our rigorous academic standards."
Mari Paananen
Associate Professor, University of Gothenburg

Talk to a Data Expert

Have a question? We'll get back to you promptly.