Exploring AI and TDM in Life Sciences, Materials Science, and Fintech

T
The Link
By: Saskia Hoving, Thu Jan 9 2025
Saskia Hoving

Author: Saskia Hoving

The webinar 'An intro to text and data mining for data scientists' explored the advanced techniques of TDM combined with AI, showcasing four case studies drawn from various disciplines. These examples — two from biomedical sciences, one from materials science, and one from financial technology (fintech) — illustrate the potential of combining TDM and AI tools on Springer Nature’s body of published research, accessed via a powerful Application Programming Interface (API). In this webinar summary, Dr. Prathik Roy and Eddie Bates explain how you can use the API for these diverse projects.

How AI extend TDM’s power

In the beginning, TDM was mainly about finding insights hidden in published research — because the amount of published material is far too big for human beings to be able to read all of it. But with the growth of large foundational models and other machine and deep learning models, data scientists can now use the corpus of published research to train their own models. These models can then provide predictive and prescriptive analytics, rather than just descriptive analytics. Google’s AlphaFold tool, for example, which predicts how proteins fold, shows how powerful these tools can be.

Biomedical sciences, use case 1: BenevolentAI

When you can combine data from across multiple sources — from clinical trial reports, patents, journal and book publications, and patient records — you find over 1 billion relationships between genes, symptoms, diseases, proteins, tissues, species, and candidate drugs.

BenevolentAI, a company that describes itself as applying advanced AI to accelerating biopharma drug discovery, has been using these data sets to train and build models to find the genes associated with medical conditions, and then to link that to candidate compounds that can act on those conditions. The company was even able to identify possible candidates for treating Covid-19 symptoms.

Biomedical sciences, use case 2: CiteAb

CiteAb, a company that describes itself as a reagent search engine, built and trained models to extract reagent information from literature. They did this by first using expert humans to build a sample model of reagent and antibody information, and they then trained AI on this model. They’re then able to continue to enhance and refine the model, and it can quickly find and extract reagent and antibody information from the literature.

Materials science use case: Semiconductor design

AI models applied to TDM on materials data have extend what’s possible with these data. At first, you could use TDM to find crystal structure data, and find materials properties from that, from comprehensive materials databases.

The next stage was to use a material’s composition to find both its structure and properties. Now, AI models can use predictive analytics to generate statistically driven materials design. That means, you can use chemical and physical data to design a material with the desired composition, structure, and properties, and then you can run virtual experiments on it, even before you get to synthesizing it and evaluating it in the real world.

So far, this approach has shown to be highly impactful in semiconductor design, which in turn, fuels integrated circuit (IC) and chip design. This has reduced the schedule overrun for IC design to less than 10%, and reduced project duration by up to 10%.

Financial technology use case

Even financial firms have become interested in using TDM for published research. Applying these models and TDM to a research corpus has allowed these companies to understand and analyse supply chains — especially for chemical manufacturing. And it has also helped understand how R&D companies’ research patterns can predict how those companies’ stocks might perform in the market.

Harnessing research data with API and FAIR principles

These use cases are built on Springer Nature’s research corpus, and the API provides access to the data these models use. That means there are two parts to this: The quality of the data, and the access to it.

Springer Nature has built — and continues to build — this data meticulously. We attract authors for our journals and books with frontline support for authors, reviewers, and editors across all stages, rigorously validate those submissions with first-rate peer review, and transform those manuscripts into best-in-class published articles, books, and databases.

Of course, those data can’t power these models if they can’t easily get at it. So that’s why Springer Nature works to make our whole database FAIR, which means:

  • Findable: Rigorous metadata and other elements feeding into discoverability platforms and/or apps. 
  • Accessible: The material must be both human and machine readable and actionable, made as openly available as possible.
  • Interoperable: Structuring the data for use in as many digital labs use cases as possible, with specialized metadata vocabularies.
  • Reusable: Validated data linked directly to associated research elements.

The next puzzle piece is the API that provides the link between Springer Nature’s data and the models and machines that use it.


The webinar — which runs for a little less than an hour including Q&A — walks you through these case studies, to show what could be possible for you and your institution. Watch the webinar here, and then find out how to learn more and take the next steps.

Related content

Don't miss the latest news & blogs, subscribe to The Link Alerts!

Saskia Hoving

Author: Saskia Hoving

In the Dordrecht office, Marketing Manager Saskia Hoving is chief editor of The Link Newsletter and The Link Blog, covering trends & insights for all facilitators of research. Focusing on the evolving role of libraries regarding SDGs, Open Science, and researcher support, she explores academia's intersection with societal progress. With a lifelong passion for sports and recent exploration into "Women's inclusion in today's science", Saskia brings dynamic insights to her work.