Domain-Specific Corpora

Domain-specific corpus allows you to accelerate and enhance text mining, machine learning and NLP processes.


Why have a domain-specific corpus?

Having a domain-specific corpus offers many benefits, such as elevating the accuracy of your NLP processes.

Protecting the Semantic Integrity

Created by our team of linguists, a domain-specific corpus helps you protect the semantic integrity of your textual data.

  • Better search results
  • Facilitiating land support processes
  •  Auto answer
  • Chatbots
  •  Text prediction
  •  Automated content  production
  •  Smart assistants
  •  Machine translation

Precise Analyses

Domain-specific corpora elevate the precision and accuracy of your NLP processes and text analyses.

  •  Sentiment analysis
  •  Market search
  • Competition analysis
  •  Intention Detection
  •  Evaluating Trends
  • Analyzing customer reviews
  •  Summarizing
  •  Named entity  recognition


Terms and specific meanings related to your domain is included in the domain-specific corpus.

  •  Tokenization
  •  POS Tagging
  •  Dependency Parsing
  •  Stemming and      Lemmatization
  •  Stop Words Removel
  •  Word Sense  Disambiguation
  •  Noun Chunks
  •  Finding Similarity
  •  Morphological Analysis
  • Lexical Analysis

Tailor-made NLP Solutions for Your Organization


Spellcheck and Diacritics Reconstruction

As part of pre-processing, misspellings are fixed and Turkish texts written with ASCII-only characters are corrected.

POS Tagging

The syntactic category of each word (adjective, noun, adverb, verb, conjunctive etc.) is determined and a corresponding tag is created during the POS tagging process.

Named-entity Recognition

As an information extraction process, named-entity recognition aims to detect and classify named entities (person names, locations, percentages, time expressions, organizations, monetary values, locations and so forth) in a text.

Semantic Annotation

Our linguists annotate each word in a text regarding their context, semantic relations and features.

Polarity Annotation

For the sentiment analysis process, sentiment orientation of the base forms in the data are identified as “very positive,” “positive,” “neutral,” “negative” and “very negative.”

Always Available

We don’t require subscription or similar commitments. Once we deliver your domain-specific corpus, you can always access it. This way, you can incorporate your corpus into your text mining, deep learning, machine learning and artificial intelligence projects at your own pace and any time you desire.


Collaboration of the Academia and Industry

In Starlang, we believe that the collaboration of the academia and industry is the driving force of scientific developments and economic growth. That is why we base our NLP and software applications on theoretical and applied branches of computational science and linguistics.

Moreover, all members of our team retain their close bonds with academia. They pursue their research interests through publications and taking part in conferences related to their area while producing state-of-the-art NLP solutions.

Our Partners

Feel free to contact us if you have further questions.