Domain-specific corpora and NLP solutions for Turkish

We provide comprehensive domain-specific corpora and unique NLP solutions for Turkish, so that you can enhance the performance and accuracy of text mining, deep learning, machine learning and artificial intelligence applications of your organization.

We work with a team of expert linguists, computer scientists and software developers. As a result, we are able to offer numerous tools and solutions that are created in consideration of the unique typology and challenges of Turkish language.

Below you can find some of our NLP libraries and tools for Turkish. If you’d like to learn more about our services and unique solutions, feel free to get in touch.


Data Parser

Data parser is often used to dissect a sequence of tokens. Within the context of NLP, parser is used for the analysis of texts and building corresponding data structures of grammatical structures.

Simply put, parser aims to break down a text into sentences, phrases and individual words.



Spellchecking (or spell-checking) is a process that aims to flag and fix misspellings. For this purpose, a spellchecker algorithm can be employed or the process can be conducted manually.

In Starlang, we aspire to provide both accuracy and speed. That is why we opt for a semi-automated spellcheck process that involves both our team of linguists and a comprehensive spellchecker algorithm developed for Turkish.



DEASCIIFIER converts Turkish texts written with ASCII-only characters into proper Turkish. This process is often referred to as diacritics restoration or diacritics reconstruction.


Case Correction

As part of pre-processing, uppercase letters in the textual data are converted to lowercase.


Correction of Wrong Splits and Merges

Errors caused by merging or splitting words incorrectly needs to be fixed, so that the entirety of the text can be analysed correctly. That is why we correct such instances either manually or using Levenshtein distance and Damerau-Levenshtein distance.

Domain Specific Dictionaries

Context makes up a significant portion of word meaning. That is why using domain specific dictionaries provide best analyses and enhanced NLP processes.

As Starlang team, we offer domain specific dictionaries that cover the related terms and most frequently used words.

POS Tagging

POS tagging is used to determine the syntactic category of each word (adjective, noun, adverb, verb, conjunctive etc.) and create a corresponding tag.

Hyponymy Relations

Creating hyponymy relations involves categorizing and sorting the words in accordance with the overlaps in their semantic field.

Domain-specific Semantic Categorization

Domain-specific semantic categorization aims to create unique categories for domain-specific terms.

Named-entity Recognition

Named-entity recognition (also known as entity extraction or entity identification) is one of the information extraction processes whose aim is to detect and classify named entities in a text. Classification categories for named entities can be person names, locations, percentages, time expressions, organizations, monetary values, locations and so forth.

Semantic Annotation

Our team of linguists annotate each word in a text regarding their context, semantic relations and features like central meaning, connotation, synonymy, antonymy and such.

Morphological Analysis

For the morphological analysis process, words are broken into their morphemes, so that their internal structure can be analysed.

Annotation for Sentiment Analysis

For the sentiment analysis, base forms in the text are annotated in a two-phase process.

First, our team of linguists identify the sentiment orientation of the base forms: “positive,” “neutral” and “negative.” In the second phase, “positive” and “negative” marked forms are reassessed to determine whether they are “very positive,” “positive,” “negative” or “very negative.”

Do you need NLP solutions for Turkish?