Write non ascii characters into text run

david3 · March 21, 2023, 4:07am

Product: PDF SDK

Product Version: PDFNetPython3==9.4.0

Please give a brief summary of your issue: How do I get the character codes for a particular unicode string?

Please describe your issue and provide steps to reproduce it:
I have looked at the following examples:

https://groups.google.com/g/pdfnet-sdk/c/b6gXWPhfyZA?pli=1

So I know that I can create a Font with the unicode characters I need with something like:
text_content = ‘Ànthöné’
Font.Create(doc.GetSDFDoc(), “Helvetica”, text_content)

And use the font with the element builder to create the text object. My question is, how do I get the character codes for the Font for the unicode characters in my string above (text_content). Specifically, I would need to know the character codes for the Font for, [‘À’, ‘ö’, ‘é’]. How do I get these dynamically from the Font? I know there is Font.MapToUnicode, and I would just need the opposite.

The Python examples show this having the codes already enumerated in hexadecimal:

# Latin
latin = ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 0x45, 0x0046, 0x00C0, 
        0x00C1, 0x00C2, 0x0143, 0x0144, 0x0145, 0x0152, '1', '2' ]# etc.
writer.WriteElement(eb.CreateUnicodeTextRun((latin), len(latin)))
writer.WriteElement(eb.CreateTextNewLine())

# Greek
greek = [0x039E, 0x039F, 0x03A0, 0x03A1,0x03A3, 0x03A6, 0x03A8, 0x03A9]
writer.WriteElement(eb.CreateUnicodeTextRun((greek), len(greek)))
writer.WriteElement(eb.CreateTextNewLine())

Thanks!

shakthi124 · March 21, 2023, 5:21pm

Hello, the hex values that are passed into the CreateUnicodeTextRun() method are the standard unicode values (not the font specific character codes). For example, you can refer to the unicode chart here to get some values to pass on to the function.

Note that you also have the option of passing the unicode string directly to the CreateUnicodeTextRun() method instead:

text_content = ‘Ànthöné’
writer.WriteElement(eb.CreateUnicodeTextRun(text_content))

david3 · March 22, 2023, 4:24am

So that’s the first thing I tried, and I get the following error:

      element = eb.CreateUnicodeTextRun(text_content)

E TypeError: CreateUnicodeTextRun() missing 1 required positional argument: ‘text_data_sz’

It looks like that particular overloaded function interface doesn’t exist in Python? In the google group discussion that I posted above, it looks like you can call CreateUnicodeTextRun with a unicode string.

If I add the length of the unicode string, I get the following error:

element = eb.CreateUnicodeTextRun(text_content, len(text_content))
Fatal Python error: Aborted
Current thread 0x000000010088c580 (most recent call first):
File “/dev/miniforge3/envs/em_poc/lib/python3.9/site-packages/PDFNetPython3/PDFNetPython.py”, line 10905 in CreateUnicodeTextRun

shakthi124 · April 1, 2023, 1:19am

Apologies for the delay. You are correct in that that method does not exist. We will look into improving the API in the future. For now, note that you can get the code point of the unicode value using the following Python code:

for char in text:
    # Get the Unicode code point of the character
    code_point = ord(char)
    # Convert the code point to a hexadecimal string
    hex_value = hex(code_point)

You can then pass these values as an array to the CreateUnicodeTextRun as the sample does.

Please let me know if this works for you.

david3 · April 17, 2023, 8:10pm

Thanks for that. I had to tweak the approach a little bit to just use the ord() function, and not convert to hex. Now I’m running into the same issue as documented here: Python: CreateTextRun with unicode objects

Namely, the correct characters are being written, but with spaces in between the characters:

david3 · April 17, 2023, 8:55pm

Actually nevermind! This issue had to do with the Font constructor. After updating my code to reuse this bit:

text_content = ‘Ànthöné’
Font.Create(doc.GetSDFDoc(), “Helvetica”, text_content)

The characters display properly now. Thanks!