Aubit4GL Manual

Prev Chapter 4: Problems Up Chapter 4: Problems Section 4.2: Engines Next

4.1 Curses

If you want Aubit4GL to use 4GL’s character screen control statements (e.g. MENU, DISPLAY, DISPLAY ARRAY, etc), you will need the curses library: NCURSES v 1.78 or later.

4.1.1 Wide Characters

Where in the past a character meant 1 byte (using either the lower 7 bits for 127 possible characters, or later 8 bits for 255 possible characters). The mapping of the numbers to characters is called an encoding. Increasingly Linux distributions and postgreSQL are configured by default to use the UTF-8 encoding. Aubit4GL can work with UTF-8 which is a multibyte encoding but only if a special wide version of curses is installed.

4.1.2 Encodings

The following encodings are now common:

Unicode multibyte character - ie - a single character needs more that one byte to store it. Chinese, Japanese, and Korean - need lots more than 255 possible characters. The Java programming language uses Unicode as its encoding (both internal and external).

UTF-8 unicode using 8 bit characters - this uses the top bit to say its multibyte. It can store ascii 0-127, then anything over that needs more than one byte. The number of bytes actually used can be up to 4 - but can also be 1 (for <=127) or 2 etc.

iso8859-1 http://en.wikipedia.org/wiki/8859-1 Common single byte 8-bit character encoding which uses the characters between ASCII 128 and 255 to store characters common in western alphabets.

iso8859-15 similar to 8859-1 - but with some values changed.

The C library has functions like isprint() which determine if a character is printable. The curses library uses this internally for determining whether we can display a character. If LANG is not set then the default character set is the ASCII characters 1-127 (the first 31 are mostly control codes which are not printable).

This means that if LANG is not set the 0xa3 (a UK pound) won’t be printable and so you cannot use it. Likewise for accented vowels used in French, or Umlauts over o’s and u’s in German etc.. If you set LANG=en_GB, it sets the character set for isprint() to the iso8859-1 and 0xa3 becomes printable. Note: this is not unicode or multibyte etc. Its just plain old ASCII.

On modern Linux distributions of Ubuntu, OpenSUSE, etc in the UK, LANG is often set to en_GB.UTF-8 (meaning British but using UTF-8 character encoding). With UTF-8, the high bit is used to mark it as a multibyte character, so 0xa3 is no longer available for the pound ("£") - we need a new multibyte character instead. In UTF-8 it is in fact 2 bytes: c2a3.

If, however, we want to use the Latin (western) character set, we need a non-UTF-8 locale in LANG and we can use UI=TUI

If we want to use UTF-8, we need a UTF-8 locale (e.g. en_GB.utf-8) and we need to use UI=TUI_wide. TUI_wide though will only be compiled up if it detected the wide ncurses which is unfortunately not always compiled by default in Linux distributions.

To get Aubit4GL to work with multibyte character sets such as UTF-8 you may have to

install the wide character ncurses library
compile Aubit4GL from source so that when you ./configure it, it will compile and generate a
libUI_TUI_wide.so in the plugins directory.
export A4GL_UI=TUI_wide
set your LOCALE so that isprint() returns true for the higher characters..

If you are using ISO8859-1 or similar, (i.e. not multibyte), you can check for printable characters with a C program something like :

#include <locale.h>
#include <ctype.h>
#include <stdio.h>
main()
{
  int a;
  setlocale (LC_ALL, "");
  for (a=0;a<255;a++)
  {
    int isp;
    if (isprint(a))
    {
       printf("%3d %c\n",a,a);
    }
    else
    {
       printf("%3d Not printable\n",a);
    }
  }
}

You can try with different LANGs, but if the character is not printable, the ncurses form library will not allow you to use it.

4.1.3 LENGTH

Regardless of whether you have multibyte characters, 8-bit characters, or simple ASCII, the 4GL length(str) function returns the byte-count of the string which will be more than the character count if there are multibyte characters in the string. Also the function substr(i) may point into part of a multibyte character and produce unexpected output (or an unprintable character).

Prev Chapter 4: Problems Up Chapter 4: Problems Section 4.2: Engines Next