Convert Shift-JS and UTF-8 to PostScript

Hello All,

This is actually a question for a SUSE10 installation, also posted on the SUSE forums, but not getting any attention just yet… I thought maybe someone here would have a suggestion as well.

I’ve been asked to find a way to convert Shift-JS and UTF-8 to PostScript on an older SLES10 installation. Currently I use a2ps for PostScript conversions (sometimes in conjunction with iconv), but a2ps doesn’t support Shift-JS or UTF-8. There are many search references to u2ps, paps and uniprint, but I’m not familiar with any of these. It seems that paps could be a good fit for me. Does anyone have experience with this? Will it run SLES10?

I’m open to any suggestions that can accomplish this.

SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 1

Thanks to all who reply,
Ray

Not to disencourage you, but the few openSUSE users here that also use SLES/SLED mostly watch the SUSE forums also. So I doubt you will find a large new audience here.

OTOH there might be general, Linux wide, solution that you may be able to port to SLES 10.

Please enlighten me. I do not know what Shift-JS is, but UTF-8 is an encryption method mostly used on Unicode (even so much that many people use the one term for the other and v.v. or for the combination, I will use it below for the combination).

PostScript is a way to define pages to be printed.

Nowadays in Linux text may be in UTF-8 almost everywhere. And most applications will call a common printer interface (may differ according to dektop) that is able to print to a PostScript file.
Thus when I have a simple text file in UTF-8. I can browse to it using Dolphin, open it with Kwrite and Kwrite will be able to print it to a PS file. (I just did this with a 15 bytes file Arabic in UTF-8 and now have a 12614 byte large PS file, which looks fine when I look in it with Okular.

Same of course with an open document (odt) or any document that can be handled by LibreOffice. LibreOffice will print to a PS file when asked.

A bit searching on the internet brough the usage of groff. Something like

groff -D utf-8 -T ps tf >tf.ps

might do something usefull when you want to process it in the batch.
-D utf-8 to specify the input encoding;
-T ps to specify the output “device” (as it is called in groff/troff nomenclature);
tf is the file with UTF-8 encode Unicode characters;

tf.ps is the redirection of the PostScript output to the file tf.ps

All as usual to be found using

man groff

and for the _D specification it points to

man preconf

On 2014-05-08 20:36, hcvv wrote:

> Nowadays in Linux text may be in UTF-8 almost everywhere. And most
> applications will call a common printer interface (may differ according
> to dektop) that is able to print to a PostScript file.
> Thus when I have a simple text file in UTF-8. I can browse to it using
> Dolphin, open it with Kwrite and Kwrite will be able to print it to a PS
> file. (I just did this with a 15 bytes file Arabic in UTF-8 and now have
> a 12614 byte large PS file, which looks fine when I look in it with
> Okular.

He refers to this, I believe:


cer@minas-tirith:~> a2ps p.txt -o p.ps
[p.txt (plain): 1 page on 1 sheet]
[Total: 1 page on 1 sheet] saved into the file `p.ps'
cer@minas-tirith:~> gv p.ps
cer@minas-tirith:~> file p.txt p.ps
p.txt: UTF-8 Unicode text
p.ps:  PostScript document text conforming DSC level 3.0
cer@minas-tirith:~>

And the file displays correctly in gv. Now, if I can have a sample text
file using complicated UTF-8 chars, I can test that as well. But what I
have here is just 13.1, and some older releases at home. I don’t know
when SLES did the switch to UTF, though :-?


> cer@minas-tirith:~> a2ps --version
> GNU a2ps 4.13
> Written by Akim Demaille, Miguel Santana.
>
> Copyright (c) 1988-1993 Miguel Santana
> Copyright (c) 1995-2000 Akim Demaille, Miguel Santana
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> cer@minas-tirith:~>

a2ps does some pretty printing of plain text files, it is not as simple
as using kwrite :wink:


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

I browsed through the man and info pages of a2ps on openSUSE 13.1 (should be very recent, shouldn’t it?). On specifying input:

-X, --encoding=NAME
use input encoding NAME

Thre is an interogation command:

a2ps --list=encodings

but that has no Unicode and/or UTF-8.
It seems that development there got stuck before they could introduce UTF-8/Unicode. They mention it, but I can nnot find that they provide an “Encoding Description File” for it.

On 2014-05-09 11:16, hcvv wrote:

> Thre is an interogation command:
>
> Code:
> --------------------
> a2ps --list=encodings
> --------------------
>
> but that has no Unicode and/or UTF-8.

Oh.

> It seems that development there got stuck before they could introduce
> UTF-8/Unicode. They mention it, but I can nnot find that they provide an
> “Encoding Description File” for it.

Mmm.

The file I tried was indeed UT-8 encoded, so there is at least some
support. But maybe it only works with those chars that correspond to the
“normal” iso encodings. It is then a question of trying, and finding out
if the texts the OP uses work or not.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

On 2014-05-09 12:44, Carlos E. R. wrote:

> Mmm.
>
> The file I tried was indeed UT-8 encoded, so there is at least some
> support. But maybe it only works with those chars that correspond to the
> “normal” iso encodings. It is then a question of trying, and finding out
> if the texts the OP uses work or not.

I just copied over some Chinese (I think) text, and tried to convert it.
The result was simply terrible, no similarity at all with the original.
Not only the Chinese fails, but the Spanish line I had working before in
the file, now also fails.

So it depends on what UTF is in the text.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

As the first 128 Unicode points are the same as the 128 ASCII characcters and as the UTF-8 encoding leaves them as one byte sequencies with the same value, any pure ASCII character (or string of them) is also UTF-8 encoded Unicode.

I tried it with the UTF-8 encoded Unicode (from a LO document, copied pasted into terminal window with vi opened: [FONT=DejaVu Sans]خراحاڧظ[/FONT]
As I described above, using Kwrite to print it to PostScript gave a ps file that, when opened with Okular, showed the text.

On 2014-05-09 13:46, hcvv wrote:
>
> As the first 128 Unicode points are the same as the 128 ASCII
> characcters and as the UTF-8 encoding leaves them as one byte sequencies
> with the same value, any pure ASCII character (or string of them) is
> also UTF-8 encoded Unicode.

But I made sure of using Spanish chars, like á or ñ, which are always
above the 127 mark, and they worked.


cer@minas-tirith:~> hexdump -C p.txt
00000000  c3 a1 c3 a9 c3 ad c3 b3  c3 ba c3 b1 e2 82 ac c3
|................|
00000010  a6 c2 ab 0a                                       |....|
00000014
cer@minas-tirith:~>

That’s two byte encoded (c3 XX, c3 xx…)
See that “hexdump” is unable to render the plain text part…

> I tried it with the UTF-8 encoded Unicode (from a LO document, copied
> pasted into terminal window with vi opened:
> خراحاڧظ
> As I described above, using Kwrite to print it to PostScript gave a ps
> file that, when opened with Okular, showed the text.

Yes.
But without the pretty-printing-formatting that a2ps does.

a2ps can be used in scripts, for instance. Or for an email to fax
gateway… It depends on the needs of the OP.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Would have been nice if you would have added

cat p.txt

Easier to decipher then.

But I checked, it is UTF-8 encoded Uniicode indeed.

And hexdump is also a bit retarded in that it can only understand ASCII.

But leaving this tinkering alone, only the OP can explain what he realy wants. When he uses all sorts of features from a2ps that is a problem. But he only said he used it and that he wanted UTF-8/Unicode to PostScript. No further specifications. No special whish for interactive (GUI) or mass background processing.

On 2014-05-09 14:36, hcvv wrote:
>
> Would have been nice if you would have added
>
> Code:
> --------------------
> cat p.txt
> --------------------
>
> Easier to decipher then.

Oh, sorry.


áéíóúñ€æ«


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Very simple requirement - no GUI. Just need to convert files to PostScript before sending them to the output device.

This is a database application in Japan. The database code page is Shift-JIS, a.k.a. sjis. All users are using the SHIFT-JIS character set.

In parts of the world using iso-8859 based character sets, I send the output to a file, convert it to PostScript, and lp -d to the printer or PDF converter as required.

groff is available on the box, so I will play with it over the weekend - Thanks Henk!

First attempt:

/usr/bin/iconv -f sjis -t utf8 sjis-1.prn > sjis-1.utf8

groff -d utf-8 -T ps sjis-1.utf8 > sjis-1.ps
sjis-1.utf8:13: warning: can’t find character with input code 12

Thanks to you both,
Ray