(things i made)

Uliza is a startup providing translation, transcription & voice recognition services for businesses in Asia and Africa. I worked at Uliza for 3 months as a data science intern building a spellchecker CLI for Twi, a Ghanian language.

    Akan Twi Spell Checker


  • Developed the first ever spell checker for Akan Twi, a language spoken by 10.5 million Ghanians.
  • Implementation

  • Scraped over 4000 words in Twi w/ tagged parts of speech (POS). Used this data to train a hidden markhov model that tags POS.
  • Accumulated a list of Twi affixes and built a lemmatizer that truncates affixes from words.
  • Assess string edit distance between input and known words, taking into account POS, to spellcheck & recommend corrections.
  • https://portfolio-assets-derrick-zhen.s3.us-east-2.amazonaws.com/spellcheck_demo.gif

    Back Translation QA Tool


  • At Uliza, some translations (from English into a foreign langauge) are manually translated back to English for QA purposes.
  • This CLI automates the QA process by assessing the semantic similarity between the original English phrase and the back-translation.
  • Implementation

  • Used a doc2vec Python library to assess similarity between original/back-translation.
  • https://portfolio-assets-derrick-zhen.s3.us-east-2.amazonaws.com/btr_tool.gif

    Frequency Dictionary Compiler CLI


  • CLI that allows data scientists to quickly build corpuses (libraries of known words) from raw texts like books.
  • Used on Twi books to scrape more than 11,000 words.
  • Implementation

  • Python for string parsing and dictionary building.
  • pallets/click library for building the CLI.
  • https://portfolio-assets-derrick-zhen.s3.us-east-2.amazonaws.com/freq_dict_tool.gif