This paper covers the history and use of literals (or constants) in programming languages, from the beginning of programming to the present day. Literals in many programming languages are discussed including modern languages such as C, Java, scripting languages, and older languages such as Ada, COBOL, and FORTRAN. Design issues, types of literals, and problems with literals are illustrated. Literals vary across languages much more than most programmers would expect.

 

Literals. 2

Integer Literals. 2

Design Issues for Integer Constants. 3

Ada Integers. 4

Size of Integer 4

C Family. 4

Arbitrarily Long Integers. 5

Visual BASIC 6.0 and QBasic. 5

Visual Basic .NET Type Designations. 5

Base or Radix of Integers. 6

Questions. 7

Real Literals. 7

Design Issues for Floating Point Constants. 7

Decimal Point Placement 8

Precision of Reals. 8

Complex Numbers. 9

What is doubled?. 9

FORTRAN 90 Kind Numbers. 10

Questions. 11

Questions. 11

Boolean Literals. 11

Design Issues for Boolean. 12

Character Strings Literals. 13

Design Issues for Character Strings. 13

String Delimiters. 14

String Escape Sequences. 14

Perl and UNIX Shell Character Strings. 16

Perl Alternative Quotes. 16

Perl Additional Escape Sequences. 17

UNIX Backquotes. 17

Special Literals= = where. 18

C# Verbatim Sting Literals. 18

Python Triple-Quoted Strings. 18

here Documents. 19

Date Literals. 20

Lisp Ratios. 20

Array Literals. 21

String Comparison==move to strings. 21

Repeating Literals. 21

Conclusion. 22

Questions. 22

 

Literals

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 United States License.

Copyright Dennie Van Tassel 2004.

Please send suggestions and comments to dvantassel@gavilan.edu

 

Literals or constants are the values we write in a conventional form whose value is obvious. In contrast to variables, literals (123, 4.3, “hi”) do not change in value. These are also called explicit constants or manifest constants. I have also seen these called pure constants, but I am not sure if that terminology is agreed on. At first glance one may think that all programming languages type their literals the same way. While there are a lot of common conventions in different languages there are some interesting differences.

 

Literal

Explanation

285

Typical integer

34.67

Typical real

4.23E-4

Typical scientific

140_345

Integer in Perl or Ada

true

Typical boolean

0x1b or Z"1B"

Hexadecimal literal

'B'

Typical character

"Hello"  or  'Hello'

Typical character string

5HHello

Old FORTRAN Hollerith string

null   ZERO

Special literals

Various Literals in Different Languages

Table x.1

 

Literals represent the possible choices in primitive types for that language. Some of the choices of types of literals are often integers, floating point, Booleans and character strings. Each of these will be discussed in this chapter.

 

Integer Literals

Integers are commonly described as numbers without a decimal point or exponent. Another description for integer literals is a string of decimal digits without a decimal point. Thus the following are valid integers in all languages:

 

   123   0   -14   21345

 

Integers may or may not have a sign and must fall within some restricted range. Negative values need to be preceded by a minus sign. If integers use 32 bits, then the maximum value would be 2^31 – 1 (since we need to use one bit for negative numbers).

 

There are two more integer constants available in some languages:

 

   +45   5e2

 

Early C did not allow +45 since integers without a sign, such as just 45, are positive by default, so no unary positive sign was used. Thus C had a unary negative operator but no unary positive operator. But many later C compilers and Java allow the unneeded positive signs on constants. Few other languages actually forbid unary positive signs.

 

The last constant 5e2 which would evaluate to 500 would be a floating point value in C and FORTRAN. Their rule is a floating point constant has a decimal point OR exponent, or both. Thus 5.0, 5e0, and 5.0e0 would all be the same floating point 5.0. But in Ada integers can have positive exponents, so 5e2 (or 5e+2) is integer 500. Negative exponents are not allowed for Ada integers. Thus 5e-3 is an error in Ada[1], but 5.0e-3 is a floating-point constant.

 

Design Issues for Integer Constants

There are a few design issues for integers. They are:

 

  • What sizes of integer constants is available? For example, do we have short integer, regular integers, and long integers?
  • How do we indicate the particular type of integer constant wanted?
  • What bases of integers are available? Examples that may be available besides decimal could be octal, hexadecimal, or any base.
  • Is there any separator available like the comma used for thousands?

 

There is a yes answer to all the above questions in some language, and different languages have different answers.

 

Most languages have one or more default size for integers available. On a 16-bit word size machine integers range from –32,768 to +32,767, which is about 2^15 - 1. On a 32-bit word size machine integers range from –2,147,483,648 to +2,147,483,647, which is about 2^31 – 1. Today 64-bit integers are common. Unfortunately, computer integers cannot have those useful commas to mark thousands.

 

But this is an over simplification since we can have hexadecimal integers and they use letters. And we may want octal values and some way to indicate the desired size of our integers. Also, the definition of integer in the previous paragraph is not true for all languages.

 

Ada Integers

For example, in Ada both integer and real literals can have an exponent. Thus in Ada the integer literal 2100 could also be written as:

 

   21e2   210e+1   2100e+0

 

But in many other languages the exponent would indicate that the above are floating point literals. For integers, the exponent must be positive.  Ada allows us to use the underscore to improve readability. The underscore is often used to separate a number into groups of three digits like commas are used in non-programming areas. Here are some examples:

 

   1_234.56   408_847_1400   1_000_000   12_27_05   4_345e2

 

In most of the above numbers the underscore is placed where a comma would normally be, but the underscore can be placed in any convenient place. Perl and Ruby also allow underscores in their integers.

 

Size of Integer

C Family

If we have more than one size of integers, we need some way to indicate the precision of the integer constant. The C family uses an L or l (ell) after an integer to indicate a long integer. Thus 12L is used for a long integer. We can use the lower case l but few can tell the difference between 12l (12 and L) or 121 (12 and one), so we always use an upper case L. These suffixes are useful to force arithmetic into a particular precision.

 

Besides long integers, we have unsigned integers in C, which use the suffix u or U. Thus we could write 15u or 15U to get the unsigned integer fifteen. Long unsigned integers are indicated with the terminating ul or UL, so 23ul or 23UL will get an unsigned long integer twenty-three. For regular integers one bit must be saved to store the sign of the integer. If a variable or constant is unsigned, then that bit can be used for the integer. Thus a signed integer may have 2^15-1 or -32,768 to +32,767, but an unsigned integer stored in the same amount of storage can go from 0 to +65,535 which is 2^16-1.

 

If we are in a language that has long integers, then how do we use them? For example, if we write 123456789012, we do not want to end up with an integer overflow or truncation. A good compiler would automatically store this integer as a long integer, but we may want to help it (or us) with 123456789012L.

 

Arbitrarily Long Integers

In most languages long integers are restricted to some large size. Python uses the same L to indicate a long integer like, 12345678901234567890L, but Python long integers can be arbitrarily big. Other languages such as Ruby and Lisp dialects have these arbitrarily long integers and are called bignum systems.

Visual BASIC 6.0 and QBasic

These forms of BASIC have two types of integers. The two types are integer and long integer. Early BASIC did not have types for numbers. There was no distinction between integers and floating point. But now we have several numeric types. For numeric constants a suffix is used on the number to indicate the type. Here is what they use:

 

Numeric Type

Suffix

Bytes of Storage

Integer

%

2

Long integer

&

4

Single precision

none or !

4

Double precision

#

8

Types in BASIC

Table x.2

 

Thus 15% is an integer, while 15& is a long integer, and 15 (or 15!) is a floating point, single precision float. By default all numbers are real (floating point) single precision. If we want a double precision float 15, then we type 15#.

 

Visual Basic .NET Type Designations

VB .NET has broken from its BASIC parents and changed the type-designations characters they append to numeric literals. Whole numbers (no decimal points) are type Integer and numbers with decimal points are type Double. Otherwise, they use a method similar to previous dialects of BASIC, but use different codes to change the default type. VB .NET codes are as follows:

 

S          Short integer

I           Integer

L          Long integer

F          Single-precision floating point

R          Double-precision floating point

D         Decimal

 

So they have three types of integers and two types of floating point. They use Decimal for decimal fractions such as dollars and cents. Thus 45S is a Short integer, 45I (or 45) is an Integer, and 45L is a Long integer. And 234.5F is a Single-precision floating point literal and 234.5R (or 234.5) is a Double-precision floating point literal. Finally, 780.23D is Decimal currency-type literal.

 

The range of values for VB .Net is much larger than previous languages. For example, long integer range from ±9x10^18. C# .NET has similar types and value ranges.

 

Base or Radix of Integers

C Family

Sometimes we want a different base or radix of our constants besides base 10. Base 8 and base 16 are useful for storage addresses. The C family allows us to indicate octal constants by preceding the number with a zero. So 012 is octal 12, not decimal twelve. For octal values the range of digits is 0-7.

 

So putting this together with what we learned in the previous section we can use the terminating L to make the constant Long and the U to make it unsigned. Thus 012UL is the unsigned long octal value 12 or the equivalent of the decimal value 10.

 

For hexadecimal values we need to precede the number with an 0x or 0X. Thus 0x12 is hexadecimal 12, not decimal 12. Now the range of acceptable “digits” is 0 1 2 3 ... 9 A B … E F. We can use upper or lower case letters a-f. Again we can use long integer indicator “L” on these too. Thus 07L is a long octal seven, and 0x7L is a long hexadecimal seven. We can also use the terminating U to make it unsigned. Thus 0XFUL is the unsigned long hexadecimal value F, which is equivalent to the decimal value 15.

 

Ruby does the same for octal and hexadecimal literals as C does, but Ruby has added 0b for binary numbers. So in Ruby we can have hexadecimal values like 0x12, octal values like 012, and binary values like 0b1001.

 

FORTRAN 90

FORTRAN 90 does this a little differently. They allow radix (number base) 2, 8, or 16. They start the value with letter B for binary or radix 2, letter O (oh) for octal, and letter Z for hexadecimal. Then the number follows by a string of digits enclosed in double or single quotes. The range of digits must be acceptable for the desired base (no 8 or 9 in octals). The integer value 200 would be B”11001000” for base two, O”310” for base eight, and Z”C8” for base 16. I try very hard not to be chauvinistic, but I sure like the C method better in this case.

 

This FORTRAN 90 solution illustrates the problem of adding a feature to an existing language. They cannot just decide to use the C solution, that all numbers starting with a zero are octal values. Millions of old FORTRAN programs would no longer work correctly when compiled on new FORTRAN 90 compilers, since 012 would be octal 12 instead of decimal 12. On the positive side of this change, thousands of old FORTRAN programmers would suddenly have employment.

 

Ada  

Ada, being a language with always a little more, does what C and FORTRAN do, but has added more bases and uses a different syntax. An integer can be expressed in any base from 2 to 16 by prefixing the number by its base and then bracketing the number within # symbols. Thus the decimal value 35 can be expressed in various bases as follows:

 

   2#100011#   4#203#   8#43#   10#35#   16#23#

 

While this is kind of interesting, I do not see much use for base 7 or 11, but obviously someone did. In addition, C and FORTRAN 90 can only use octal or hexadecimal integer constants; Ada allows floating point constants in these different bases. Thus 23.45 could be expressed in base 16 or another base from 2 to 16

Questions

1. Suppose you wanted to add more bases to Java or C++. Presently, those languages can only handle decimal, octal, and hexadecimal. The Ada people designed their methods in at the beginning, but the FORTRAN had to add it to an existing language. Try to figure out how you could add more bases to C++ or Java without breaking millions of old programs.

Real Literals

Reals are numbers with a decimal point, thus 4.3 is a real literal. Real numbers are called floats or floating point in some languages. Another descriptions of reals is a number with a decimal point or an exponent (or both), thus 2e2 would be a real literal using this definition. Like integer literals, a positive or negative sign can precede the number and no commas are allowed. Thus some real literals are:

 

   0.0   -4.302   7.   3.2e-4   4.9678E+3   4e-3

 

If the language accepts both lower and upper case, the “e” for exponent can be lower case or upper case. It may vary by language if 4e-3 is acceptable, or we may need 4.0e-3 (with a decimal point). The “e” stands for exponent and means multiply by 10 the value that follows. Thus

 

   4.3e2 = 4.3 x 10^2 = 4.3 x 100 = 430.0

 

Scientific notation is useful for expressing very small numbers or very large values (such as your chances to win the lottery or the national debt).

 

Design Issues for Floating Point Constants

There are a few design issues for floating point constants. Here are some:

 

  • What sizes of floating point constants is available? For example, do we have float, double, and long double?
  • How do we indicate the particular type of floating point constant we want?
  • What bases of floating point are available? Examples that may be available besides decimal could be octal, hexadecimal, and maybe others.
  • Is there any separator available like the comma used for thousands?

 

There are interesting answers to all the above questions in some language, and different languages have different answers.

 

Decimal Point Placement

Early in this chapter when we discussed integer literals, we noted that integer literals can also have exponents. So for Ada, real literals must have a decimal point. Another Ada rule is real literals must have a digit on each side of the decimal point. Thus 4. (or .05) are not a legal Ada real literal, but 4.0 is acceptable. COBOL has similar but different restrictions on floating point literals. In COBOL the literal .25 is OK, but 25. is not OK, and must be changed to 25.0 since the period terminates statements when followed by a space. In Pascal .04 is not legal, since we need a digit before the decimal point, such as 0.04.

 

Precision of Reals

C Family

The C family has three types of reals: float, double, and long double. And they allow us to indicate the type of the real literal. Real constants such as 3.4, 2.0, and 4.564e-2 are all stored as double by default. If we want 4.3 to be stored as a float (instead of a double) we can add an f or F after the constant like this 3.4F or 3.4f. If we want 3.4 to be stored as a long double, then we use l (lower case L) or L like we do with integers. Thus 3.4 as a long double would be 3.4L or 3.4l, but the last one looks a lot like three point forty one, instead of 3.4L. All these suffixes are useful to control the amount of storage used and the precision of the result.

   1.0/3.0     // uses double precision.

   1.0F/3.0F   // uses float precision.

   1.0L/3.0L   // uses long double precision.

For the float example, we need both constants float, otherwise the arithmetic would be done in the higher type, that is double. For the long precision, just one of the constants in long double would force the arithmetic to use long double. This is explained more in the section on Coercion in the Arithmetic chapter.

 

FORTRAN

In FORTRAN the default type is single precision (like float in C). We may type 4.3 which is a single precision real but we may want it stored as a double precision real. FORTRAN uses the suffix D or d to indicate double precision. Thus we can write 4.3D0 or .43d1 to indicate this is a double precision real value. This is an easy way to force arithmetic into double precision. For example:

   x = 1/3d0

will get us a double precision division because 3d0 is double precision.

 

Complex Numbers

FORTRAN IV has complex numbers. Data of complex type is represented by two numbers in parenthesis separated by a comma. The number left of the comma is the real part, while the number to the right of the comma represents the imaginary part of the complex number. Thus the complex constant 3 + 2i can be assigned to the complex variable x as follows:

 

   x = (3, 2)

 

Fortran has all the necessary operations and functions to handle complex values. It is interesting how early in computing history complex values were handled by Fortran.. Ruby uses a similar syntax for their complex constants.

 

In Python, complex numbers are composed of two floating-point numbers – the real part and the imaginary part – and are coded by adding a J or j to the imaginary part. Thus we can write 3.0 + 4.3J for a complex number. A few other languages have built in complex numbers and the necessary arithmetic operators and functions.

 

What is doubled?

When we talk about single or double precision of integers and reals we need to figure out what is doubled. Integers are the easiest to understand since we do not have to worry about an exponent or decimal point. The smallest integer can be stored in one byte, 8 bits, with one bit for a sign. Thus there is room for a positive or negative sign and then 7 bits, or 2^7, which gives us a range of –32768 to +32767. The next size of integer may use two bytes, which allow for a range of 2^15, or  –2147483648 to +2147483647. Finally, the next largest integer would be 2^31. As you have seen the largest, smallest, and number of integer types is language and machine dependent. But this is fairly true across many languages.

 

language integers

size

 

 

 

 

 

 

 

 

 

 

The situation gets much more machine and language dependent for floating point values. For reals, there are two parts besides the sign, the exponent and mantissa. Thus for 3.45e-2, 3.45 is the mantissa and -2 is the exponent. The mantissa is commonly 7 places for the smallest float, 15 places for next largest float, and finally 31 places for the largest float. Not all languages have three sizes. Early languages only had one size. Newer languages tend to have three sizes, especially when the language is used for scientific programming.

 

There are two ways available for programs to get the precision of integers and floating-point values. The way covered so far and the most common, is programmers get what the language or hardware gives us. For example, smaller floating-point values have 7 place accuracy and larger floating point values have 15 place accuracy. These defaults are based on the size of words in the hardware. This loss of control is mostly accepted without question. But when we expand the variety of machines available and the size of machines, the defaults change. So both FORTRAN and Ada have means for us programmers to select the exact precision needed.

FORTRAN 90 Kind Numbers

FORTRAN 90 has a method similar to how C marks precision of their real numbers, but the FORTRAN method is more powerful and flexible. But first we need to discuss the need for FORTRAN variations for default number precision. FORTRAN has been around for decades and is available on very small computers and very large computers. A single-precision real number might have seven significant digits and a double-precision number might have 15 places on many computers. But a small computer may not have that range and a large super computer may have twice the range. So if a FORTRAN program is written on one computer a means is needed to indicate the needed precision when the program is taken to a new computer.

 

So FORTRAN 90 provides a kind number that is used to indicate the kind of precision needed for real and integer values. For real numbers there are at least two default kind numbers and for integer values there are 3 or 4 kind numbers.

 

For real numbers they use the kind number 1 to indicate single (7 significant digits) or the kind number 2 to indicate double (15 significant digits). Some FORTRAN compilers may have larger significant digits and another kind number. To specify a kind of constant, an underscore followed by a kind number is appended to the constant. Thus 3.14159_1 has a single precision kind and 3.14159_2 has a double precision kind, because it has a “_2” after it. (Notice the underscore in FORTRAN has a different meaning than it has in Perl or Ada.)

 

Integer values have a kind 1 for values in the range of 2^7, kind 2 for values in the range of 2^15, kind 3 for 2^31, and maybe kind 4 for 2^63. Thus 123456789_3 has an integer kind number 3. If a kind value is not supported by a compiler it generates a syntax error when compiling the program.

 

There is a great deal more to this in FORTRAN. We can use the operator :: (two colons) to indicate exact minimum precision needed for both integer and real values. Named constants can be used for kind values. Here are a couple of brief examples:

 

INTEGER, PARAMETER :: Range18 = SELECTED_INT_KIND(18)

REAL, PARAMETER :: Prec20 = SELECTED_REAL_KIND(20, 40)

 

First, we need to set up kind indicators. In the above two lines, Range18 can be used to indicate integer kind range of 18 digits, and Prec20 indicates real numbers with at least 20 significant digits with exponents range up to 40. Now we can use these like we did the kind constants 1, 2, or 3.

 

   12345_Range18

   3.14159_Prec20

 

This was a very brief description of FORTRAN kind numbers. If you want more information you will need to find a FORTRAN 90 textbook. These kind numbers are also available for variables.

 

Questions

1.      Your PhD thesis is to indicate how to expand C++ or Java so these languages can indicate desired precision for constants or variables. Read the previous section on FORTRAN 90 kind numbers. If your method breaks all previous C++ programs you will not obtain your PhD.

2.      Perl and early BASIC does not distinguish between integers and floats. These two languages just have numbers. Do you think this is a good approach? Should we do this in OPL? Why or why not?

 

Questions

1.      We have seen several types of numeric literals or constants. What numeric constants do you think we should have in OPL?

 

2.      Do we want to allow for different integer literals? For example, short or long integers? Do we need one, two, three, or more types of integers? And how shall we indicate what is desired when we type an integer?

 

3.      Do we want to allow for different real literals? For example, float, double, or long double. Do we need one, two, three or more types of real literals? How shall we write these different forms in OPL?

 

4.      What base or radix of integers will we allow: binary, octal, hexadecimal, others? The C family has one way, FORTRAN 90 has another way, and Ada has a third method. And how shall we write the different numbers in OPL?

 

5.      Most or all languages do not allow commas in numeric literals, like 1,234. Is this restriction still necessary? Do you think we should allow commas in number for OPL? Notice how Ada handles this.

 

 

Boolean Literals

We need or want a literal for true and false. These are called Booleans or logicals, depending on the language. Some languages have a reserved or keyword for these values. Booleans are ordinal values and usually false is less than true. The normal operations are and, or, and not.

Design Issues for Boolean

There are a few decisions and differences for Boolean values. Here are some questions:

 

  • Are there special reserved or keywords for the Boolean values?
  • Are the Boolean values ordered? That is, is false < true or vice versus?
  • Are Booleans ordinal values? Can they be used for choices in a switch statement?
  • When talking about booleans do we use the capitalized Booleans or the lower case booleans? Both versions are common in books. This is probably the most difficult problem, since the difficulty of a problem is often inversely related to its importance.

 

FORTRAN

All versions of FORTRAN use .TRUE. and .FALSE. for their logical constants. And in FORTRAN they are called logicals instead of Booleans. Since FORTRAN does not have reserved words, FORTRAN uses a period before and after these logical literals to differentiate them from the variables TRUE and FALSE. If we print a FORTRAN Boolean variable, it will print either T or F, and those are what we need for input if reading Boolean data into a program.

 

Ada, Pascal, ALGOL, and Java use true and false for their Boolean literals.

 

The inputting and outputting of logical values is messy in most languages. If we want to use the integer 123, we can use it exactly that way as a literal constant in the program, read in the integer, or print the integer and it is all the same. It is not as simple with logical literals.

 

Some languages do not have a nice way to input or output Boolean values. For example, in FORTRAN, logicals print as an F or T. But when we want to read in a value for true, we can use the letter t, or period and t (i.e., .t), or period and the word true (i.e., .true), or any string that starts with the letter t, or a period, letter t, then anything. So input, output, and inside the program are all potentially different for FORTRAN logicals.

 

C Family

The C family of languages does not use named constants for logical values. Instead they use 0 (zero) for false and 1 (one) for true as the result of relational or logical expressions. Thus if we tried to print 4< 3 we would get a zero, and 4< 3 would get us a 1.

 

   cout << "true=" << 3<4 << endl;

   cout << "false=" << 4<3 << endl;

 

While this works in C++, something similar could be done in other languages to see if and what it prints. The situation is a little more complicated since a value of zero is equivalent to false, and any other value is equivalent to true. So

 

   if (x) . . .

 

will be false, when x is equal to zero and false otherwise. While this can be a blessing when we know what we are doing, it is also a common source of bad program errors when we are not careful. Java broke away from its