The International Development Jargon Detector counts jargon words in reports, presentations and other documents.

What's this about?

The International Development Jargon Detector (IDJD) is a bottom-up, data-driven approach to empowering underserved beneficiaries in the field. It was also a way to play with text extraction and the Natural Language Toolkit in Python.

What does it actually do?

The IDJD uses a pre-defined list of "jargon" words. It extracts text from most common file formats and counts how many times the uploaded text contains words from the list. Word stems are used for counting so, for example, "sustain", "sustaining" and "sustainability" are considered the same. Stop words like "of", "it", "the" are ignored.

What file formats can I use?

Most of them! .txt and .doc(x) will work, but so will .pdf and .ppt, and even .csv and .xls(x)

What are some current limitations of the IDJD

The main limitation is that words are compared one at a time, out of context. This misses phrases like "results oriented" or "in the field". It also means the IDJD can't make qualitative distinctions – "capacity" is considered jargon whether we "built stakeholder capacity" or "installed a water tank with a 10,000L capacity".

These deficiencies will be addressed in a future version, pending donor funding.

Where can I read more about jargon and international development?

Great question. Here are some articles and blog posts:

Is this serious?

Yes, of course. It's very interesting.

