infoTECH Feature

November 17, 2016

Revolutionizing Linguistic Data Conversion: Around the Cycle in 80 Hours

If you dare to claim something a revolution, you’d better be sure about it and stand fast on this statement. When it comes to education, boldly claiming any technology revolutionizing may seem even too loud. The models of schooling haven’t changed much across our history; however, modern educational trends predict something big coming.


1) Education in developed and developing countries dives in hi-tech, relies on the data from the Cloud, shifts to e-libraries, e-books and online research tools, and provides everyone with indispensable devices, from personal smartphones to 3D printing rooms and tech-stuffed spaces. Some of these spaces even get names

2) Education today is based on connectivity, both at micro and macro level. With new communication channels in hand, social networking and instant messaging as teaching tools, and international studying programs based on distant learning and online courses, it’s clear that education is equally globalized, like any other industry these days.

Not to mention who we are teaching now. It’s the Generation Z: these kids have longer spry thumbs for tapping and messaging, they mock on AI bots and develop software at the age of 15. “Old school” education would never keep up with the pace.

Big Data

Bearing the trends in mind, imagine the massive data that education deals with. Today, the data that is produced, processed, managed, and

Image via

stored by education is already classified as Big Data, and attracts attention for its application and functionality. Or the lack of functionality, to be more accurate.

As any other field that produces and processes tera- and petabytes of data, education has been confronting this issue forever: how to press out the value of data and make real use of it. Educators, institutions, all systems that operate big data still lack digital tools that could tame this massive flow and put it in order. Not until the digital approach steps in.

Big Challenge

Lexical data is only a part of big data in education, but a huge one. The masses of linguistic data, such as books, publications, dictionaries, and any other sort of content locked literally in letters, build the foundation for international communication and intercultural connectivity. This sort of content in education has a tremendous value, not only humanitarian, but also a common human value. For this reason, the management of linguistic data with the means of modern digital approach is hard to underestimate. It’s crucial, yet awfully complicated. And this is where the revolution starts.

Revolution (News - Alert)

When the Digiteum data processing team came across the challenge of converting massive linguistic data locked in dictionaries, they instantly realized that the only way to find a solution is to think outside the box. Conventional humanitarian approaches and the models that have been used for these purposes by computer linguists came up short; it took weeks to analyze lexical data, and more weeks to convert it. Apart from that, non-automated approach resulted in 15-20 percent data loss and the lack of quality control standards.

Our experts, developers by profession and inventors by nature, thought: “Why would we reinvent the wheel, instead of using the existing one? Why not apply the approved data conversion models from code analysis and programming language compilers and adjust it to the lexical data conversion needs?” And it worked!

Data Conversion Anew

This simple, yet ingenious idea allowed the team to revolutionize linguistic data conversion and bring Dictionaries Conversion Framework into the world. As a result, this unconventional approach to the challenge unlocked the possibilities that surprised the team itself.

1. Universal Access

Dictionaries store national treasures: complete language content that forms the mindsets of whole nations and remains the backbones of whole cultures. Dictionaries Conversion Framework allows efficient and faultless conversion of massive linguistic data from a number of formats into a number of formats. In other words, it allows turning an old paper book dictionary into a formatted online version with its further publishing for open access. In the days of ubiquitous TXT, XML and PDF and proprietary Apple (News - Alert) and Amazon formats the option to interconvert data between formats is a must, rather than a benefit. Reference to the programming language compiler approach allowed the team to solve this conversion puzzle playfully.

2. Quality Control

Imagine a dictionary that consists of millions of entries. Once you convert this lexical data into the other format or restructure it into a new layout, you need to be sure you can verify the results. In case of manual check, it will take you weeks or months.

For the revolutionary approach that the team applied, millions shrinked into hundreds. The team determined the method to squeeze all the entries into the combination of unique ones and use this combination as a test pattern for verification. This is how the issue of quality control was solved with conversion quality guarantee close to 100 percent.

3. Cost Efficiency

It wasn’t enough just to optimize the operations to increase the efficiency. To move lexical data conversion to the new level, the team found out how to make it faster at lower cost and no necessity to engage unique specialists. Since Dictionaries Conversion Framework uses an IT approach to data analysis and conversion, the team that performs the conversion is also IT-structured. One linguist, who makes the analysis, one developer, who performs conversion, and one QA, who closes the cycle of operations with testing, work as a conveyor and can complete a single dictionary conversion within 80 hours. This is the ambition much like a modern Jules Verne story with scientific approach and astonishing

Image via


The value of this benefit has a global scale: if we manage to convert dictionaries so efficiently, we may provide nations and cultures with digitally underrepresented linguistic content regardless of costs. In this context, it means providing the territories, countries, and the whole nations that may struggle with insufficient investment, with education materials in sufficient quantity.

4. Unlimited Scalability

Similar to Turing’s decryption mechanism that can solve the mystery of any code, not only Enigma’s, Dictionaries Conversion Framework has an unlimited scalability potential, which possibly, gets over the limits of lexical data conversion and may be applied to the legacy data across other education storages. Not to mention the practical use of data conversion approach in other industries and particular businesses that suffer from the lack of data management technologies. Research by Data Conversion Laboratory shows that many industries, fields and businesses have concerns about their data management opportunities, and data conversion in particular. Why not solve them digitally?

Image via
Image via 

Can one technology be responsible for the whole cycle of operations? Clearly, no. In case of dictionary conversion, Digiteum’s Dictionaries Conversion Framework forms a part in the full set of operations and technologies engaged in the process. However, it would be careless not to admit the importance of this technology that fires the whole cycle.

Would it be fair to name this technology revolutionary for education? Dictionary conversion brings valuable linguistic content out into the world and makes it available for scholars and learners worldwide. It gives digitally and online underrepresented content the chance to get into open access and provides legacy data with a new start and a new form. In a way, something similar happened when the world saw Gutenberg’s letterpress. Sounds like a revolution.

Edited by Alicia Young

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers