Complex Sentence Simplification Techniques

Many neurodiverse individuals, such as those with autism, ADHD, or dyslexia, will benefit from simplified sentences that are direct and unambiguous. Natural Language Processing (NLP) techniques are utilized in making long and complex sentences easier to understand for neurodiverse individuals. Through my research, I have identified following techniques of NLP that can be utilized in conjunction to each other for simplifying complex sentences.

Dependency Parsing: In this technique, a sentence is split into multiple shorter, logically connected sentences using dependency parser libraries such as spaCy (a free open-source library for Natural Language Processing in Python) to break down sentences into their core components (subject, verb, object) and identify clauses. For example, “The teacher, who was known for her innovative teaching methods, explained the complex topic thoroughly.” Is simplified to “The teacher was known for her innovative teaching methods. She explained the topic thoroughly.”
Sentence Simplification Models: These models are specifically designed to rephrase complex sentences into simpler ones while retaining the original meaning. Most popular available options are OpenAI’s language models and transformer-based models like T5 (Text-To-Text Transfer Transformer). These models are fine-tuned using training dataset for simplification tasks.
Coreference Resolution: Coreference resolution identifies which words in a sentence refer to the same entity, thus helping in breaking down complex structures and clarify meanings. Hugging Face’s transformers and AllenNLP offer pre-trained models for coreference resolution. Open AI’s models and T5 models mentioned above can also do co-reference resolution, but require fine-tuning.
Summarization Models: Sentence-level summarization tools such as BERT and Open AI’s GPT models (fine-tuned on summarization tasks) are fairly effective for simplifying content by focusing on the main points.
Lexical Simplification Tools: In this technique, we substitute complex words with simpler synonyms for ease of understanding. Tools such as Lexi, YATS (Yet Another Text Simplifier) and BERT-LS that uses pretrained BERT models can be used to enable lexical simplification.
Paraphrasing Tools: Paraphrasing tools powered by transformer-based models such as BART (Bidirectional and Auto-Regressive Transformer), T5, and Pegasus have ability to rephrase complex sentences into simpler alternatives, making them easier to understand.
Active Voice Conversion: Active voice is easier to process. For example, “The problem was solved by Sarah.” when changed to “Sarah solved the problem” is easier and direct to be processed by neurodiverse individuals. ChatGPT models are able to do this transformation out of the box in most cases, but if required we can also do some fine tuning to achieve this.

As we can see above, recent advancements in Large Language models makes simplification of complex sentences far easier than what it used to be before. To conclude, the pipeline for Neurodiverse-Focused complex sentence simplification will look something like this.

Input: User inputs a long and complex sentence.

Processing: We run dependency parsing to split clauses, followed by applying lexical and syntactic simplifications using rules and pre-trained models described above, and lastly resolve ambiguities through coreference resolution to ensure clarity. We the input text provided by the user is very long, we can also use summarization models to ensure easy comprehension.

Output: Simplified sentences.

References:

https://www.irjet.net/archives/V9/i3/IRJET-V9I331.pdf

https://aclanthology.org/C18-1021.pdf

https://www.irjet.net/archives/V9/i3/IRJET-V9I331.pdf

https://aclanthology.org/2024.readi-1.4.pdf

https://medium.com/@r2consultingcloud/a-step-by-step-guide-to-custom-fine-tuning-with-chatgpts-api-using-a-custom-dataset-54dae6c055ce

https://medium.com/nlplanet/two-minutes-nlp-four-different-approaches-to-text-summarization-5a0ce9c06c74

https://arxiv.org/pdf/2302.11957

https://cdn.aaai.org/ojs/21477/21477-13-25490-1-2-20220628.pdf

https://medium.com/huggingface/how-to-train-a-neural-coreference-model-neuralcoref-2-7bb30c1abdfe

https://medium.com/@r2consultingcloud/a-step-by-step-guide-to-custom-fine-tuning-with-chatgpts-api-using-a-custom-dataset-54dae6c055ce