unicode - How can I decode arabic and urdu characters python

Question

Welcome To Ask or Share your Answers For Others

unicode - How can I decode arabic and urdu characters python

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

unicode - How can I decode arabic and urdu characters python

I'm trying to convert a string in base64 into actual readable characters but some of the text contains characters in the urdu and arabic languages such as:

\xd8\xb9\xdb\x81\xd8\xaf\xd9\x90 \xd9\x86\xd9\x88
\xd8\xa2\xd9\x8f\xd8\xb1\xd8\xaf\xd9\x88 \xd8\xaa\xd8\xaf\xd8\xb1\xdb\x8c\xd8\xb3 \xd9\x85\xdb\x8c\xda\xba \xd8\xa8\xdb\x81\xd8\xaa\xd8\xb1\xdb\x8c \xda\xa9\xdb\x92 \xd9\x84\xdb\x8c\xdb\x92 \xd8\xa7\xdb\x81\xd8\xaf\xd8\xa7\xd9\x81

when I remove the extra backslashes and put it in the terminal I get

?1??ˉùù?￠ù?±?ˉù ?a?ˉ?±??3 ù
?úo ?¨??a?±? ú?? ù
                  ?? ?§??ˉ?§ù

How can I convert the above string correctly?

Edit:

I got this string from an email using the gmail API. This is the string of the email:

DQpIaSBNT0hBTU1BRCwNCllvdXIgd29yayDYuduB2K_ZkCDZhtmIICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL2MvTWpJek5UVXhNekl5T1RFMS9hL01qTTBPRE14TURrNE9EUXcvc3VibWlzc2lvbnM-ICANCmlzIGR1ZSB0b21vcnJvdy4gV291bGQgeW91IGxpa2UgdG8gdHVybiBpdCBpbj8NCg0K2Lnbgdiv2ZAg2YbZiA0KRHVlOiBKYW4gMjQNCk9QRU4gIA0KPGh0dHBzOi8vY2xhc3Nyb29tLmdvb2dsZS5jb20vYy9Nakl6TlRVeE16SXlPVEUxL2EvTWpNME9ETXhNRGs0T0RRdy9zdWJtaXNzaW9ucz4NCklmIHlvdSBkb24ndCB3YW50IHRvIHJlY2VpdmUgZW1haWxzIGZyb20gQ2xhc3Nyb29tLCB5b3UgY2FuIHVuc3Vic2NyaWJlICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL3M-Lg0KDQpHb29nbGUgTExDDQoxNjAwIEFtcGhpdGhlYXRyZSBQa3d5DQpNb3VudGFpbiBWaWV3LCBDQSA5NDA0MyBVU0ENCg==

To decode it you run it through base64.urlsafe_b64decode. In this string there will be unicode characters like those which were listed above. How do I decode those into the urdu and arabic characters?

question from:https://stackoverflow.com/questions/65869364/how-can-i-decode-arabic-and-urdu-characters-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:24:26+0000

How can I convert the above string correctly?

text = '\xd8\xb9\xdb\x81\xd8\xaf\xd9\x90 \xd9\x86\xd9\x88
\xd8\xa2\xd9\x8f\xd8\xb1\xd8\xaf\xd9\x88 \xd8\xaa\xd8\xaf\xd8\xb1\xdb\x8c\xd8\xb3 \xd9\x85\xdb\x8c\xda\xba \xd8\xa8\xdb\x81\xd8\xaa\xd8\xb1\xdb\x8c \xda\xa9\xdb\x92 \xd9\x84\xdb\x8c\xdb\x92 \xd8\xa7\xdb\x81\xd8\xaf\xd8\xa7\xd9\x81'

The following encode/decode progression could help:

text.encode().decode('unicode-escape').encode('latin1').decode('utf-8')

'???? ?? ????? ????? ??? ????? ?? ??? ?????'

Update I got this string from an email using the gmail API:

textb64 = 'DQpIaSBNT0hBTU1BRCwNCllvdXIgd29yayDYuduB2K_ZkCDZhtmIICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL2MvTWpJek5UVXhNekl5T1RFMS9hL01qTTBPRE14TURrNE9EUXcvc3VibWlzc2lvbnM-ICANCmlzIGR1ZSB0b21vcnJvdy4gV291bGQgeW91IGxpa2UgdG8gdHVybiBpdCBpbj8NCg0K2Lnbgdiv2ZAg2YbZiA0KRHVlOiBKYW4gMjQNCk9QRU4gIA0KPGh0dHBzOi8vY2xhc3Nyb29tLmdvb2dsZS5jb20vYy9Nakl6TlRVeE16SXlPVEUxL2EvTWpNME9ETXhNRGs0T0RRdy9zdWJtaXNzaW9ucz4NCklmIHlvdSBkb24ndCB3YW50IHRvIHJlY2VpdmUgZW1haWxzIGZyb20gQ2xhc3Nyb29tLCB5b3UgY2FuIHVuc3Vic2NyaWJlICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL3M-Lg0KDQpHb29nbGUgTExDDQoxNjAwIEFtcGhpdGhlYXRyZSBQa3d5DQpNb3VudGFpbiBWaWV3LCBDQSA5NDA0MyBVU0ENCg=='
import base64
print(base64.urlsafe_b64decode(textb64).decode('utf-8'))

Hi MOHAMMAD, Your work ???? ?? https://classroom.google.com/c/MjIzNTUxMzIyOTE1/a/MjM0ODMxMDk4ODQw/submissions is due tomorrow. Would you like to turn it in?
???? ?? Due: Jan 24 OPEN https://classroom.google.com/c/MjIzNTUxMzIyOTE1/a/MjM0ODMxMDk4ODQw/submissions If you don't want to receive emails from Classroom, you can unsubscribe https://classroom.google.com/s.

Google LLC 1600 Amphitheatre Pkwy Mountain View, CA 94043 USA

Categories

unicode - How can I decode arabic and urdu characters python

unicode - How can I decode arabic and urdu characters python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags