DeepL has gained recognition for its advanced machine translation engine. However, while exploring the inner workings of their website using the Inspector tool in the Firefox browser, I came across another hidden gem: the DeepL API endpoint used to divide texts into sentences, regardless of their length or the language used.

So, why pay Google when you can do it for free?! If you want to know how to use DeepL’s Natural Language Processing (NLP) engine to split into sentences text of any length – no account or API keys required – read on.

Exploring DeepL’s Website with Developer Tools

The first step in this reverse engineering journey involves inspecting the DeepL website using developer tools. By doing so, you are able see the requests made by the DeepL engine to translate text.

Step 1: Splitting Text into Sentences

So, upon further investigation, I found that the DeepL website makes two POST requests to accomplish the text translation.

(Click on image to enlarge). In order to translate your text, the website will make two essential POST requests to the DeepL API

The first one is a POST request to https://www2.deepl.com/jsonrpc?method=LMT_split_text. This request captures the text entered by the user and sends it to the DeepL engine for sentence splitting.

The response from this request is a JSON object containing a list of sentences, along with additional information such as the request ID and the detected language codes for the input text. (This is what we’ll use!)

Step 2: Translating the Extracted Data

Then, using the data extracted from the previous step, the browser proceeds to make a second POST request to the DeepL engine. This time, it calls https://www2.deepl.com/jsonrpc?method=LMT_handle_jobs.

One of the parameters sent with this request is called jobs. This is a list of dictionaries containing the sentences returned by the DeepL engine in Step 1.

Each dictionary consists of several keys. Notably:

  • kind with the value default;
  • preferred_num_beams with a value of 1;
  • sentences, a list containing a single sentence to be translated;
  • raw_en_context, a list of up to 5 sentences preceding the currently translated sentence. If the raw text has only one sentence or if the current sentence is the first one in the text, this list will be empty;
  • raw_en_context_after, a list containing one sentence that follows the currently translated sentence. If the current sentence is the only sentence or the last sentence in the text, this list will be empty.

Focusing on Sentence Splitting

For the purpose of this article, we will concentrate on the first step only: using DeepL to split text into sentences. It’s worth noting that, unlike the web translation service, which accepts 3.000 characters or less, there appears to be no character limit, and the server can handle as many requests as needed.

The Working Code

Below is the code I wrote to reverse engineer DeepL’s private API for sentence splitting. It utilizes the requests module to make the necessary API calls and json module for handling JSON responses.

import requests
import json

def split_sentences(raw_text: str) -> list[str]:
    """
    Split the given raw text into sentences using DeepL's private API.

    Args:
        raw_text (str): The raw text to be split into sentences.

    Returns:
        List[str]: A list of sentences extracted from the raw text.

    Raises:
        requests.exceptions.RequestException: If there is an error connecting to the DeepL API.
        json.decoder.JSONDecodeError: If there is an error decoding the API response.
        KeyError: If the expected data is not found in the API response.
    """
    params = {
        "method": "LMT_split_text",
    }

    json_data = {
        "jsonrpc": "2.0",
        "method": "LMT_split_text",
        "params": {
            "texts": [raw_text],
            "commonJobParams": {
                "mode": "translate",
            },
            "lang": {
                "lang_user_selected": "auto",
            },
        },
    }  
    try:
        response = requests.post("https://www2.deepl.com/jsonrpc", params=params, json=json_data)
        response.raise_for_status()
    except requests.exceptions.RequestException as error:
        raise requests.exceptions.RequestException(f"DeepL connection error: {error}")

    try:
        data = response.json()       
        sentence_dicts = data["result"]["texts"][0]["chunks"]
        return [sentence_dict["sentences"][0]["text"] for sentence_dict in sentence_dicts]
    except (json.decoder.JSONDecodeError, KeyError) as error:
        raise ValueError(f"Error getting DeepL data: {error}")


def get_text(text_file: str) -> str:
    """
    Reads the contents of a text file and returns the text as a string.

    Args:
        text_file (str): The path to the text file.

    Returns:
        str: The contents of the text file as a string.
    """
    with open(text_file, "r") as file:
        text = file.read()
    return text

# Create a file called "my_text.txt". This will hold the text you need to split into sentences
text = get_text("my_text.txt")
sents = split_sentences(text)

for index, sent in enumerate(sents, start=1):
    print(f"{index}. {sent}")

In the code above, the split_sentences function takes the raw_text input (from the file my_text.txt) and sends a POST request to DeepL’s private API. It then retrieves the list of sentences from the API response and returns them.

To demonstrate the functionality, I’ve included a sample usage where the contents of a text file are loaded and passed to the split_sentences function.

Here’s some sample text, taken from Wikipedia:

Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of “understanding” the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.

And here are the resulting sentences returned by DeepL, printed along with their corresponding index:

1. Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
2. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.
3. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
4. Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.

Conclusion and next steps

By reverse engineering DeepL’s private API, you gain access to an advanced NLP engine for accurate sentence splitting – for free! This opens up exciting possibilities for text processing tasks, regardless of the language involved.

Splitting text into sentences using NLP can be useful in various applications. Here are some of the possible uses (plus many more):

  1. Text summarization: Sentence splitting is often a pre-processing step for text summarization tasks. By splitting the text into sentences, you can then analyze and summarize each sentence individually, extracting important information or generating a condensed version of the text.
  2. Sentiment analysis: Splitting text into sentences allows you to analyze the sentiment of each sentence separately. This can be valuable in sentiment analysis tasks, where you determine the overall sentiment (positive, negative, or neutral) of a piece of text by aggregating the sentiments of its constituent sentences.
  3. Machine translation: When translating text from one language to another, sentence splitting helps to break down the source text into smaller, more manageable units. Translating sentence by sentence can improve the accuracy and coherence of the translated output.
  4. Text classification: Sentence splitting can be useful for text classification tasks. By analyzing sentences individually, you can extract features or patterns that contribute to the classification task. Each sentence can be treated as an independent unit, allowing for more granular analysis.
  5. Named entity recognition: Splitting text into sentences aids in named entity recognition, which involves identifying and categorizing named entities (such as person names, locations, organizations) within a text. Sentence boundaries help define the context for each named entity, assisting in its identification and classification.
  6. Information extraction: By splitting text into sentences, you can identify and extract specific information from individual sentences. This can be beneficial in extracting structured data, such as dates, locations, or numerical values, from unstructured text sources.
  7. Language modeling: Sentence splitting is often a fundamental step in language modeling tasks. By breaking down the text into sentences, you can build language models that predict the next word or sequence of words given the preceding sentence(s).
  8. Text-to-speech synthesis: Splitting text into sentences is important in generating natural-sounding speech. By dividing the text into sentences, you can apply appropriate intonation, pauses, and rhythm to the synthesized speech, making it more human-like.

Leave a Comment