IndicNLP

Microsoft's Indian language NLP suite — tokenizers, embeddings, and models for 11 Indian languages

About IndicNLP

Microsoft's Indian language NLP suite — tokenizers, embeddings, and models for 11 Indian languages

Key Features

Tokenizers for 11+ Indian languages
Word embeddings trained on Indian language corpora
Sentence boundary detection for Indic scripts
Script normalization across Indian writing systems
Language identification for Indian languages
Morphological analyzers
Open source Python library
Used in academic and industry research
Foundation for many Indian NLP applications
Maintained by Microsoft Research India

Who Is It For?

Professionals, enterprises, and teams looking for AI-powered solutions.