Replacing Text on a PDF

Product: PDFTron SDK (C++)

Product Version: 11.3.8.5075

We are using ElementReader/ElementWriter to iterate over the Elements in a PDF and replace existing Text. But we have a Problem with German special Characaters. I tried different Codes but can’t get it to work. Here’s the Code-Example:

pdftron::UString ustr{strTextToWrite.c_str()};
pdftron::UChar buff[strTextToWrite.Length()];
unsigned int counter{0u};
for(int i{0}; i < ustr.GetLength(); ++i)
	buff[i] = ustr.GetBuffer()[i];

// Pass the UTF-16BE data to SetTextData
element.SetTextData(buff, strTextToWrite.Length());
element.UpdateTextMetrics();
writer.WriteElement(element);

strTextToWrite contains either the original text (if nothing has changed) or some new Text, that the existing text should be replaced with.
strTextToWrite.c_str() is of type wchar_t*

In our PDF we have Arial as the font and special German Characters. But when iterating over the PDF and replacing the text (for testing “replacing” means, i write the text as it is already on the PDF) we have the problem that some text, where german special characters are contained not displayed anymore.

For example:
Here you see the second line, which contains an ß
image

After “replacing” only the ß is displayed (which is weird, because in my opinion this should be the character that can’t be displayed?)
image

Obviously i am doing something wrong. But i can’t figure out what.

1 Like

Hello, I’m AiRon, an automated tech support bot :robot:
While you wait for one of our customer support representatives, I can help you with your question.


It seems like you’re encountering issues with handling German special characters when replacing text in a PDF using ElementReader and ElementWriter in the PDFTron SDK. The problem might be related to character encoding or font embedding issues.

Some things you can try:

  • Ensure that the font being used supports the special characters. You might need to embed the font or use a font that supports the full range of characters you need.

  • Make sure the text data is correctly encoded in UTF-16BE, which is required by the PDF format for text.

  • Verify that the entire text string, including special characters, is being passed correctly to the SetTextData method.

  • Check if the original text encoding matches the encoding expected by the PDFTron SDK. Conversion issues might arise if there’s a mismatch.



Here are some resources that might help you:


Hopefully, this helps you solve your problem while you wait for a human to get back to you.

1 Like

Hello,

If I understand correctly this issue you are running into is only happening with symbols similar to ꞵ. If such a symbol is not part of the word you don’t see this text replacement issue happen correct?

Can you please send me the sample PDF you are testing with here? If you can’t upload this via our community forum here, can you email it to us and reference this page in your email? You can email us using our email support@apryse.com.

Also, is this issue happening with multiple PDFs or only a specific one you are testing with? If you can please also pinpoint which spots of your PDF is causing this issue our team would be happy to investigate this.

Thank you

1 Like

Hello,

we have the problem with serveral PDFs. Also i figured out it don’t only happen with german special characters, but also with “normal” text, depending on the PDF. My assumption is, that it has to do with the encoding of the fonts, however i am not sure about that.

I also noticed, that when opening the PDF with another Viewer, the text isn’t “gone”, but instead there are squares, that aren’t displayed by the PDF ViewCtrl we use:

(This leads me to the assumption, that it has to do with the encoding).

I’ll send four PDFs to the support email you mentioned (Before_1 and Before_2 which are the original PDFs as well as After_1 and After_2, which are the PDFs we generated).

Here’s a bit more code for more context:

	pdftron::PDF::Element element;
	// Solange Elemente gefunden wurden
	while((element = reader.Next()) != 0)
	{
		// Element nicht sichtbar
		// -> ist uns egal

		switch(element.GetType())
		{
			// Falls es sich um einen Text handelt
			case pdftron::PDF::Element::e_text:
			{
				String strText{element.GetTextString().ConvertToNativeWString().c_str()};
				auto [koordinaten, textMatrix] = this->GetKoordinaten(element);
				auto paLayer{this->GetLayernameUndID(element)};
				String strLayername{paLayer.first};
				int nLayerID{paLayer.second};

				if (strLayername.IsEmpty() && !strLayernameParent.IsEmpty())
				{
					strLayername = strLayernameParent;
					nLayerID = nLayerIDParent;
				}

				bool bFokus{false};
				String strTextToWrite{strText};
				String strErsetzenDurch{strText};

				pdftron::UString ustr{strTextToWrite.c_str()};
				pdftron::UChar buff[strTextToWrite.Length()];
				unsigned int counter{0u};
				for(int i{0}; i < ustr.GetLength(); ++i)
					buff[i] = ustr.GetBuffer()[i];

				// Pass the UTF-16BE data to SetTextData
				element.SetTextData(buff, strTextToWrite.Length());
				element.UpdateTextMetrics();
				writer.WriteElement(element);

				break;
			}

			case pdftron::PDF::Element::e_form:
				// Evtl. das mal umbauen wie in diesem Beispiel: https://docs.apryse.com/core/samples/elementedittest
				this->UebernehmeObjekteAusFormInNeueForm(pPDFDokument, nSeiteIndex, reader, writer, element, fTextErsetzen, strLayernameParent, nLayerIDParent);
				break;

			default:
				writer.WriteElement(element);
		}
	}

Normally strTextToWrite would be something we manipulate depending on some conditions. But for testing i only “rewrite” the text, so that i can see, if this works correctly. As already mentioned strTextToWrite.c_str() is of type wchar_t*.

I am pretty sure, i am doing something wrong here, but i can’t figure out what the problem is in this case.

Thank you

Hey Christian,

I got your PDF files. This looks to require some additional testing, so I have created a new ticket and reached out to you personally with your next steps. Can you please reply to that email chain when you are able to.

Thank you

1 Like