Unfake Text
Parse and sanitize obfuscated strings by mapping Cyrillic or Greek lookalikes to Latin. Normalize full-width characters while preserving whitespace integrity.
Please configure parameters and execute the action.
About Unfake Text
Unfake Text helps you convert forged-looking text back to regular text by replacing known homoglyph characters and normalizing full-width characters.
Features
This tool provides the following features:
- Homoglyph Cleanup - Converts common Cyrillic/Greek lookalike letters back to Latin.
- Full-width Normalization - Converts full-width characters to normal width.
- Preserves Layout - Keeps whitespace and line breaks unchanged.
Examples
-
Unfake Cyrillic lookalikes
Input: Pаsswоrd rеsеt nоw Convert homoglyphs: On Convert full-width: Off Output: Password reset now
-
Unfake full-width text
Input: Hello, world! Convert homoglyphs: Off Convert full-width: On Output: Hello, world!
-
Unfake both
Input: VΕRΙFY ACCΟUNT Convert homoglyphs: On Convert full-width: On Output: VERIFY ACCOUNT
Real-World Usage Scenarios
- Identifying Homoglyph-Based Phishing - Security analysts can paste suspicious URLs or email display names into the tool to reveal hidden Cyrillic or Greek lookalikes. This helps in de-obfuscating phishing domains that visually mimic legitimate brands to steal credentials.
- Bypassing Content Moderation Filters - Content moderators use this tool to normalize 'fancy' fonts or mathematical alphanumeric symbols often used on social media to evade keyword bans. Converting these back to standard Latin allows automated moderation systems to correctly flag prohibited terms.
- Database Normalization for Legacy Exports - When migrating data from legacy software or systems using Asian input methods, full-width characters can cause duplication or search errors. This tool standardizes these characters to normal width, ensuring data integrity and preventing 'record not found' issues.
- Preparing Text for LLM Ingestion - Developers cleaning datasets for Machine Learning or Large Language Models (LLMs) use this to remove visual noise. Normalizing text ensures consistent tokenization, which improves model performance and prevents hallucinations caused by obscure Unicode characters.
Frequently Asked Questions
How does the tool handle mixed scripts?
It scans the entire string and specifically targets known homoglyphs (like Cyrillic 'а' or Greek 'ο') and replaces them with their Latin equivalents while leaving standard characters untouched.
Can it detect every possible character variation?
The tool focuses on the most common confusables and full-width characters used in phishing and filter bypasses. It is optimized for characters that are visually indistinguishable in standard web fonts.
Does using this tool affect text layout?
No. The conversion only affects the character encoding at the byte level. Whitespace, line breaks, and paragraph structures remain exactly as they were in the original input.
Why do some characters look normal but fail validation in other apps?
Many characters, like full-width Latin or lookalikes from different alphabets, look identical to the eye but have different Unicode points. This tool reconciles those differences to meet strict validation rules.