TDM and APIs: An introduction for information professionals

T
The Link
By: Emily Esposito, Mon Jul 15 2024
Emily Esposito

Author: Emily Esposito

Text and Data Mining (TDM) offers researchers a way to deal with the vast amounts of information available for their work and to get the most out of it. With TDM, researchers can derive actionable insights from complex datasets, leading to informed decisions and innovations. From academic research to healthcare diagnostics and market trend analysis, TDM is transforming raw data into valuable knowledge. Publishers are increasingly partnering with librarians to develop licensing agreements and resources that enable institutions to access the benefits of TDM. Read on for an introduction to TDM, its uses for researchers, and how to harness its valuable outputs through API solutions.

In a recent webinar, Dr Prathik Roy, Product Director, Data Solutions and Strategy at Springer Nature, offered an introduction to TDM for information professionals. He covered the purpose and uses of TDM, how TDM outputs are delivered through Application Programming Interfaces (APIs) and introduced Springer Nature’s APIs and their licensing. Whether you’re a complete novice or already familiar with TDM and APIs, this webinar covered the basics for information professionals. Here are the main topics that were covered.

TDM: Handling vast amounts of information and gaining insights

Researchers today are faced with vast amounts of information, so much that it is challenging to merely identify which of it is relevant to their work, let alone to retrieve and generate insights.

TDM is the automated process of selecting and analysing large volumes of text or data resources for purposes such as searching, finding patterns, discovering relationships, semantic analysis, and more. It is essential for researchers in dealing with the amount of available information today, but they need to have the right tools and support available to maximise output responsibly.

The two key uses of TDM are increasing discoverability of content and discerning patterns and relationships from vast amounts of text and data:

  1. Discoverability means the retrieval of highly relevant full-text articles that not only include searched keywords but also the necessary relationship between them. It refers to the ability of TDM to connect dots effectively between sources, which can include not only research papers, but also patents, clinical trials, and even institutions’ own internal datasets that it has access to.
  2. Pattern discernment refers to the identification of patterns and trends across a dataset. In this case, the output of the TDM is not articles or resources, but rather hypotheses and predictions.

Machine-readable formats: Making information available for TDM

To enable TDM, information must be available in machine-readable formats such as XML, not only in PDF files which humans can read. XML (Extensible Markup Language) is a flexible, structured format that is used to store and transport data, in which users can define tags and other data structures. Information within these files is structured and tagged such that machines can quickly identify data points and utilise them for various applications. 

Springer Nature, like other publishers, creates rich full text XML versions of its publications and other resources, curated specifically for machines to read, available for TDM purposes. In these, individual metadata is tagged along with various types of information for different disciplines, fields, and purposes (such as chemical formulas). This makes the information available for TDM.

APIs: The messenger that delivers TDM outputs

All the beneficial outputs of TDM are delivered via Application Programming Interfaces (APIs). APIs are the building blocks of the digital ecosystem. They enable the integration of systems, facilitate automation, and drive innovation by allowing applications to tap into vast resources and functionalities of other software.

APIs are like messengers taking a request, telling the system what you want, and returning the response back to you. APIs do not replace a database, and they are not summarisation tools. They fill the gap between an AI tool that anyone can use and the papers that are meant to be individually read by humans. APIs and TDM fit in beautifully in this gap. So whatever AI tools are available for use by researchers in your institution, APIs will connect them with the information they need to generate insights.

To easily grasp APIs, we can use the restaurant analogy. Imagine you're at a restaurant. You, the customer, have a menu of choices to order from. The waiter (API) takes your order (request) to the kitchen (system/database, like the Springer Nature database) and then brings back your food (data/response, in a machine-readable format).

The kitchen (system) does the work, but you communicate through the waiter (API). Just as you don't need to know how to cook the dish to enjoy it at a restaurant, with APIs, you don't need to know the details of how the system works. You just need to know what you want and how to ask for it.

Empowering research with advanced API solutions

APIs can be used to identify trends in research or business areas. When using them, your researchers can flexibly employ constraints (these are like filters to help you narrow down the most relevant information in a full text) that Springer Nature has created and that are supported in its APIs, for optimal results. Different combinations of keywords, countries, topical areas, or built-in constraints further refine the output. Applications can range from product discovery, to finding collaborators, identifying latest trends in research areas or papers, and more.

The single most important component in TDM and using APIs is the data source from which they extract information. Quality information is what feeds the various AI systems used by researchers and institutions, and that is where Springer Nature plays a pivotal role. Thanks to its broad, inclusive research corpus, made up of robust and validated research, Springer Nature can provide high quality information across a wide variety of disciplines. 

Springer Nature’s suite of APIs includes three APIs:

  1. Meta API: Allows users to access metadata and abstracts for millions of scientific documents (journal articles, book chapters, protocols), making it easier to discover and explore relevant content. 
  2. Open Access API: Allows to retrieve full-text content from Springer Nature’s open access publications.
  3. Full Text API (TDM): Unlocks the complete text of Springer Nature’s extensive collection of subscription-based content, providing comprehensive data for in-depth analysis.

Springer Nature’s developer portal has been curated with the end users in mind so it has a wealth of documentation. The registration process is simple, and for non-commercial purposes you can sign up for the Open Access API today and use it. For academic and government organisations there is a separate license available, and there is also a license for corporations for commercial applications. When you have an API license with Springer Nature, you have the right to:

  • systematically download and store the machine-readable files; 
  • extract, curate, built ontologies and taxonomies, translate, summarise, create knowledge models or graphs, etc.;
  • use TDM output commercially or non-commercially for drug discovery, disease research, regulations, and more;
  • use TDM output with internal AI tools. 

Relying on Springer Nature’s database means that TDM results will be meaningful. And because Springer Nature’s APIs are directly linked to its central database, users can be assured they have the most recent, trusted information: If changes or updates are made to any article or book chapter, or in the case of a retraction, these are reflected in the TDM output, and this is exclusive for API users.

Dr Roy’s presentation equips information professionals with the foundational knowledge necessary to navigate the TDM landscape for research purposes, and how to empower their researchers with access to this powerful tool. For the full presentation, watch the webinar recording.

Related content

Don't miss the latest news & blogs, subscribe to The Link Alerts!


Emily Esposito

Author: Emily Esposito

Emily Esposito, Senior Marketing Manager from New York, passionately connects corporate markets, information managers, and R&D departments. She shares vital insights on 7 critical messages Info Pros want R&D scientists to know through her blog, intertwining her dedication to family with industry expertise, fostering invaluable connections and community.