Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
481 views
in Technique[技术] by (71.8m points)

unicode - How can I decode arabic and urdu characters python

I'm trying to convert a string in base64 into actual readable characters but some of the text contains characters in the urdu and arabic languages such as:

\xd8\xb9\xdb\x81\xd8\xaf\xd9\x90 \xd9\x86\xd9\x88
\xd8\xa2\xd9\x8f\xd8\xb1\xd8\xaf\xd9\x88 \xd8\xaa\xd8\xaf\xd8\xb1\xdb\x8c\xd8\xb3 \xd9\x85\xdb\x8c\xda\xba \xd8\xa8\xdb\x81\xd8\xaa\xd8\xb1\xdb\x8c \xda\xa9\xdb\x92 \xd9\x84\xdb\x8c\xdb\x92 \xd8\xa7\xdb\x81\xd8\xaf\xd8\xa7\xd9\x81

when I remove the extra backslashes and put it in the terminal I get

?1??ˉùù?¢ù?±?ˉù ?a?ˉ?±??3 ù
?úo ?¨??a?±? ú?? ù
                  ?? ?§??ˉ?§ù

How can I convert the above string correctly?

Edit:

I got this string from an email using the gmail API. This is the string of the email:

DQpIaSBNT0hBTU1BRCwNCllvdXIgd29yayDYuduB2K_ZkCDZhtmIICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL2MvTWpJek5UVXhNekl5T1RFMS9hL01qTTBPRE14TURrNE9EUXcvc3VibWlzc2lvbnM-ICANCmlzIGR1ZSB0b21vcnJvdy4gV291bGQgeW91IGxpa2UgdG8gdHVybiBpdCBpbj8NCg0K2Lnbgdiv2ZAg2YbZiA0KRHVlOiBKYW4gMjQNCk9QRU4gIA0KPGh0dHBzOi8vY2xhc3Nyb29tLmdvb2dsZS5jb20vYy9Nakl6TlRVeE16SXlPVEUxL2EvTWpNME9ETXhNRGs0T0RRdy9zdWJtaXNzaW9ucz4NCklmIHlvdSBkb24ndCB3YW50IHRvIHJlY2VpdmUgZW1haWxzIGZyb20gQ2xhc3Nyb29tLCB5b3UgY2FuIHVuc3Vic2NyaWJlICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL3M-Lg0KDQpHb29nbGUgTExDDQoxNjAwIEFtcGhpdGhlYXRyZSBQa3d5DQpNb3VudGFpbiBWaWV3LCBDQSA5NDA0MyBVU0ENCg==

To decode it you run it through base64.urlsafe_b64decode. In this string there will be unicode characters like those which were listed above. How do I decode those into the urdu and arabic characters?

question from:https://stackoverflow.com/questions/65869364/how-can-i-decode-arabic-and-urdu-characters-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

How can I convert the above string correctly?

text = '\xd8\xb9\xdb\x81\xd8\xaf\xd9\x90 \xd9\x86\xd9\x88
\xd8\xa2\xd9\x8f\xd8\xb1\xd8\xaf\xd9\x88 \xd8\xaa\xd8\xaf\xd8\xb1\xdb\x8c\xd8\xb3 \xd9\x85\xdb\x8c\xda\xba \xd8\xa8\xdb\x81\xd8\xaa\xd8\xb1\xdb\x8c \xda\xa9\xdb\x92 \xd9\x84\xdb\x8c\xdb\x92 \xd8\xa7\xdb\x81\xd8\xaf\xd8\xa7\xd9\x81'

The following encode/decode progression could help:

text.encode().decode('unicode-escape').encode('latin1').decode('utf-8')

'???? ?? ????? ????? ??? ????? ?? ??? ?????'

Update I got this string from an email using the gmail API:

textb64 = 'DQpIaSBNT0hBTU1BRCwNCllvdXIgd29yayDYuduB2K_ZkCDZhtmIICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL2MvTWpJek5UVXhNekl5T1RFMS9hL01qTTBPRE14TURrNE9EUXcvc3VibWlzc2lvbnM-ICANCmlzIGR1ZSB0b21vcnJvdy4gV291bGQgeW91IGxpa2UgdG8gdHVybiBpdCBpbj8NCg0K2Lnbgdiv2ZAg2YbZiA0KRHVlOiBKYW4gMjQNCk9QRU4gIA0KPGh0dHBzOi8vY2xhc3Nyb29tLmdvb2dsZS5jb20vYy9Nakl6TlRVeE16SXlPVEUxL2EvTWpNME9ETXhNRGs0T0RRdy9zdWJtaXNzaW9ucz4NCklmIHlvdSBkb24ndCB3YW50IHRvIHJlY2VpdmUgZW1haWxzIGZyb20gQ2xhc3Nyb29tLCB5b3UgY2FuIHVuc3Vic2NyaWJlICANCjxodHRwczovL2NsYXNzcm9vbS5nb29nbGUuY29tL3M-Lg0KDQpHb29nbGUgTExDDQoxNjAwIEFtcGhpdGhlYXRyZSBQa3d5DQpNb3VudGFpbiBWaWV3LCBDQSA5NDA0MyBVU0ENCg=='
import base64
print(base64.urlsafe_b64decode(textb64).decode('utf-8'))

Hi MOHAMMAD, Your work ???? ?? https://classroom.google.com/c/MjIzNTUxMzIyOTE1/a/MjM0ODMxMDk4ODQw/submissions is due tomorrow. Would you like to turn it in?
???? ?? Due: Jan 24 OPEN https://classroom.google.com/c/MjIzNTUxMzIyOTE1/a/MjM0ODMxMDk4ODQw/submissions If you don't want to receive emails from Classroom, you can unsubscribe https://classroom.google.com/s.

Google LLC 1600 Amphitheatre Pkwy Mountain View, CA 94043 USA


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...