This paper covers the
history and use of literals (or constants) in programming languages, from the
beginning of programming to the present day. Literals in many programming
languages are discussed including modern languages such as C, Java, scripting
languages, and older languages such as
Design
Issues for Integer Constants
Visual
Basic .NET Type Designations
Design
Issues for Floating Point Constants
Design
Issues for Character Strings
Perl
and UNIX Shell Character Strings
Perl
Additional Escape Sequences
String
Comparison==move to strings
![]()
This work is licensed under a Creative
Commons Attribution-No Derivative Works 3.0 United States License.
Copyright Dennie Van Tassel 2004.
Please send suggestions and comments to dvantassel@gavilan.edu
Literals or constants are the values we write in a conventional form whose value is obvious. In contrast to variables, literals (123, 4.3, hi) do not change in value. These are also called explicit constants or manifest constants. I have also seen these called pure constants, but I am not sure if that terminology is agreed on. At first glance one may think that all programming languages type their literals the same way. While there are a lot of common conventions in different languages there are some interesting differences.
|
Literal |
Explanation |
|
285 |
Typical integer |
|
34.67 |
Typical real |
|
4.23E-4 |
Typical scientific |
|
140_345 |
Integer in Perl or |
|
true |
Typical boolean |
|
0x1b or
Z"1B" |
Hexadecimal literal |
|
'B' |
Typical character |
|
"Hello" or
'Hello' |
Typical character string |
|
5HHello |
Old FORTRAN Hollerith string |
|
null ZERO |
Special literals |
Various Literals in Different Languages
Table x.1
Literals represent the possible choices in primitive types for that language. Some of the choices of types of literals are often integers, floating point, Booleans and character strings. Each of these will be discussed in this chapter.
Integers are commonly described as numbers without a decimal point or exponent. Another description for integer literals is a string of decimal digits without a decimal point. Thus the following are valid integers in all languages:
123
0 -14 21345
Integers may or may not have a sign and must fall within some restricted range. Negative values need to be preceded by a minus sign. If integers use 32 bits, then the maximum value would be 2^31 1 (since we need to use one bit for negative numbers).
There are two more integer constants available in some languages:
+45
5e2
Early C did not allow +45 since integers without a sign, such as just 45, are positive by default, so no unary positive sign was used. Thus C had a unary negative operator but no unary positive operator. But many later C compilers and Java allow the unneeded positive signs on constants. Few other languages actually forbid unary positive signs.
The last constant 5e2 which would
evaluate to 500 would be a floating point value in C and FORTRAN. Their rule is
a floating point constant has a decimal point OR exponent, or both. Thus 5.0,
5e0, and 5.0e0 would all be the same floating point 5.0. But in
There are a few design issues for integers. They are:
There is a yes answer to all the above questions in some language, and different languages have different answers.
Most languages have one or more default size for integers available. On a 16-bit word size machine integers range from 32,768 to +32,767, which is about 2^15 - 1. On a 32-bit word size machine integers range from 2,147,483,648 to +2,147,483,647, which is about 2^31 1. Today 64-bit integers are common. Unfortunately, computer integers cannot have those useful commas to mark thousands.
But this is an over simplification since we can have hexadecimal integers and they use letters. And we may want octal values and some way to indicate the desired size of our integers. Also, the definition of integer in the previous paragraph is not true for all languages.
For example, in Ada both integer and real literals can have an exponent. Thus
in
21e2
210e+1 2100e+0
But in many other languages the exponent would indicate that the above are floating point literals. For integers, the exponent must be positive. Ada allows us to use the underscore to improve readability. The underscore is often used to separate a number into groups of three digits like commas are used in non-programming areas. Here are some examples:
1_234.56
408_847_1400 1_000_000 12_27_05
4_345e2
In most of the above numbers the underscore is placed where a comma would normally be, but the underscore can be placed in any convenient place. Perl and Ruby also allow underscores in their integers.
If we have more than one size of integers, we need some way to indicate the precision of the integer constant. The C family uses an L or l (ell) after an integer to indicate a long integer. Thus 12L is used for a long integer. We can use the lower case l but few can tell the difference between 12l (12 and L) or 121 (12 and one), so we always use an upper case L. These suffixes are useful to force arithmetic into a particular precision.
Besides long integers, we have unsigned integers in C, which use the suffix u or U. Thus we could write 15u or 15U to get the unsigned integer fifteen. Long unsigned integers are indicated with the terminating ul or UL, so 23ul or 23UL will get an unsigned long integer twenty-three. For regular integers one bit must be saved to store the sign of the integer. If a variable or constant is unsigned, then that bit can be used for the integer. Thus a signed integer may have 2^15-1 or -32,768 to +32,767, but an unsigned integer stored in the same amount of storage can go from 0 to +65,535 which is 2^16-1.
If we are in a language that has long integers, then how do we use them? For example, if we write 123456789012, we do not want to end up with an integer overflow or truncation. A good compiler would automatically store this integer as a long integer, but we may want to help it (or us) with 123456789012L.
In most languages long integers are restricted to some large size. Python uses the same L to indicate a long integer like, 12345678901234567890L, but Python long integers can be arbitrarily big. Other languages such as Ruby and Lisp dialects have these arbitrarily long integers and are called bignum systems.
These forms of BASIC have two types of integers. The two types are integer and long integer. Early BASIC did not have types for numbers. There was no distinction between integers and floating point. But now we have several numeric types. For numeric constants a suffix is used on the number to indicate the type. Here is what they use:
|
Numeric Type |
Suffix |
Bytes of Storage |
|
Integer |
% |
2 |
|
Long integer |
& |
4 |
|
Single precision |
none or ! |
4 |
|
Double precision |
# |
8 |
Types in BASIC
Table x.2
Thus 15% is an integer, while 15& is a long integer, and 15 (or 15!) is a floating point, single precision float. By default all numbers are real (floating point) single precision. If we want a double precision float 15, then we type 15#.
VB .NET has broken from its BASIC parents and changed the type-designations characters they append to numeric literals. Whole numbers (no decimal points) are type Integer and numbers with decimal points are type Double. Otherwise, they use a method similar to previous dialects of BASIC, but use different codes to change the default type. VB .NET codes are as follows:
S Short integer
I Integer
L Long integer
F Single-precision floating point
R Double-precision floating point
D Decimal
So they have three types of integers and two types of floating point. They use Decimal for decimal fractions such as dollars and cents. Thus 45S is a Short integer, 45I (or 45) is an Integer, and 45L is a Long integer. And 234.5F is a Single-precision floating point literal and 234.5R (or 234.5) is a Double-precision floating point literal. Finally, 780.23D is Decimal currency-type literal.
The range of values for VB .Net is much larger than previous languages. For example, long integer range from ±9x10^18. C# .NET has similar types and value ranges.
Sometimes we want a different base or radix of our constants besides base 10. Base 8 and base 16 are useful for storage addresses. The C family allows us to indicate octal constants by preceding the number with a zero. So 012 is octal 12, not decimal twelve. For octal values the range of digits is 0-7.
So putting this together with what we learned in the previous section we can use the terminating L to make the constant Long and the U to make it unsigned. Thus 012UL is the unsigned long octal value 12 or the equivalent of the decimal value 10.
For hexadecimal values we need to precede the number with an 0x or 0X. Thus 0x12 is hexadecimal 12, not decimal 12. Now the range of acceptable digits is 0 1 2 3 ... 9 A B E F. We can use upper or lower case letters a-f. Again we can use long integer indicator L on these too. Thus 07L is a long octal seven, and 0x7L is a long hexadecimal seven. We can also use the terminating U to make it unsigned. Thus 0XFUL is the unsigned long hexadecimal value F, which is equivalent to the decimal value 15.
Ruby does the same for octal and hexadecimal literals as C does, but Ruby has added 0b for binary numbers. So in Ruby we can have hexadecimal values like 0x12, octal values like 012, and binary values like 0b1001.
FORTRAN 90 does this a little differently. They allow radix (number base) 2, 8, or 16. They start the value with letter B for binary or radix 2, letter O (oh) for octal, and letter Z for hexadecimal. Then the number follows by a string of digits enclosed in double or single quotes. The range of digits must be acceptable for the desired base (no 8 or 9 in octals). The integer value 200 would be B11001000 for base two, O310 for base eight, and ZC8 for base 16. I try very hard not to be chauvinistic, but I sure like the C method better in this case.
This FORTRAN 90 solution illustrates the problem of adding a feature to an existing language. They cannot just decide to use the C solution, that all numbers starting with a zero are octal values. Millions of old FORTRAN programs would no longer work correctly when compiled on new FORTRAN 90 compilers, since 012 would be octal 12 instead of decimal 12. On the positive side of this change, thousands of old FORTRAN programmers would suddenly have employment.
2#100011# 4#203#
8#43# 10#35# 16#23#
While this is kind of interesting, I do not see much use for
base 7 or 11, but obviously someone did. In addition, C and FORTRAN 90 can only
use octal or hexadecimal integer constants;
1. Suppose you wanted to add more
bases to Java or C++. Presently, those languages can only handle decimal,
octal, and hexadecimal. The
Reals are numbers with a decimal point, thus 4.3 is a real literal. Real numbers are called floats or floating point in some languages. Another descriptions of reals is a number with a decimal point or an exponent (or both), thus 2e2 would be a real literal using this definition. Like integer literals, a positive or negative sign can precede the number and no commas are allowed. Thus some real literals are:
0.0
-4.302 7. 3.2e-4
4.9678E+3 4e-3
If the language accepts both lower and upper case, the e for exponent can be lower case or upper case. It may vary by language if 4e-3 is acceptable, or we may need 4.0e-3 (with a decimal point). The e stands for exponent and means multiply by 10 the value that follows. Thus
4.3e2 = 4.3 x 10^2 = 4.3 x 100 = 430.0
Scientific notation is useful for expressing very small numbers or very large values (such as your chances to win the lottery or the national debt).
There are a few design issues for floating point constants. Here are some:
There are interesting answers to all the above questions in some language, and different languages have different answers.
Early in this chapter when we
discussed integer literals, we noted that integer literals can also have
exponents. So for Ada, real literals must have a
decimal point. Another
The C family has three types of reals: float, double, and long double. And they allow us to indicate the type of the real literal. Real constants such as 3.4, 2.0, and 4.564e-2 are all stored as double by default. If we want 4.3 to be stored as a float (instead of a double) we can add an f or F after the constant like this 3.4F or 3.4f. If we want 3.4 to be stored as a long double, then we use l (lower case L) or L like we do with integers. Thus 3.4 as a long double would be 3.4L or 3.4l, but the last one looks a lot like three point forty one, instead of 3.4L. All these suffixes are useful to control the amount of storage used and the precision of the result.
1.0/3.0 // uses double precision.
1.0F/3.0F
// uses float precision.
1.0L/3.0L // uses long double precision.
For the float example, we need both constants float, otherwise the arithmetic would be done in the higher type, that is double. For the long precision, just one of the constants in long double would force the arithmetic to use long double. This is explained more in the section on Coercion in the Arithmetic chapter.
In FORTRAN the default type is single precision (like float in C). We may type 4.3 which is a single precision real but we may want it stored as a double precision real. FORTRAN uses the suffix D or d to indicate double precision. Thus we can write 4.3D0 or .43d1 to indicate this is a double precision real value. This is an easy way to force arithmetic into double precision. For example:
x = 1/3d0
will get us a double precision division because 3d0 is double precision.
FORTRAN IV has complex numbers. Data of complex type is represented by two numbers in parenthesis separated by a comma. The number left of the comma is the real part, while the number to the right of the comma represents the imaginary part of the complex number. Thus the complex constant 3 + 2i can be assigned to the complex variable x as follows:
x = (3, 2)
Fortran has all the necessary operations and functions to handle complex values. It is interesting how early in computing history complex values were handled by Fortran.. Ruby uses a similar syntax for their complex constants.
In Python, complex numbers are composed of two floating-point numbers the real part and the imaginary part and are coded by adding a J or j to the imaginary part. Thus we can write 3.0 + 4.3J for a complex number. A few other languages have built in complex numbers and the necessary arithmetic operators and functions.
When we talk about single or double precision of integers and reals we need to figure out what is doubled. Integers are the easiest to understand since we do not have to worry about an exponent or decimal point. The smallest integer can be stored in one byte, 8 bits, with one bit for a sign. Thus there is room for a positive or negative sign and then 7 bits, or 2^7, which gives us a range of 32768 to +32767. The next size of integer may use two bytes, which allow for a range of 2^15, or 2147483648 to +2147483647. Finally, the next largest integer would be 2^31. As you have seen the largest, smallest, and number of integer types is language and machine dependent. But this is fairly true across many languages.
|
language integers |
size |
|
|
|
|
|
|
|
|
|
|
|
|
The situation gets much more machine and language dependent for floating point values. For reals, there are two parts besides the sign, the exponent and mantissa. Thus for 3.45e-2, 3.45 is the mantissa and -2 is the exponent. The mantissa is commonly 7 places for the smallest float, 15 places for next largest float, and finally 31 places for the largest float. Not all languages have three sizes. Early languages only had one size. Newer languages tend to have three sizes, especially when the language is used for scientific programming.
There are
two ways available for programs to get the precision of integers and
floating-point values. The way covered so far and the most common, is programmers
get what the language or hardware gives us. For example, smaller floating-point
values have 7 place accuracy and larger floating point values have 15 place
accuracy. These defaults are based on the size of words in the hardware. This
loss of control is mostly accepted without question. But when we expand the
variety of machines available and the size of machines, the defaults change. So
both FORTRAN and
FORTRAN 90 has a method similar to how C marks precision of their real numbers, but the FORTRAN method is more powerful and flexible. But first we need to discuss the need for FORTRAN variations for default number precision. FORTRAN has been around for decades and is available on very small computers and very large computers. A single-precision real number might have seven significant digits and a double-precision number might have 15 places on many computers. But a small computer may not have that range and a large super computer may have twice the range. So if a FORTRAN program is written on one computer a means is needed to indicate the needed precision when the program is taken to a new computer.
So FORTRAN 90 provides a kind number that is used to indicate the kind of precision needed for real and integer values. For real numbers there are at least two default kind numbers and for integer values there are 3 or 4 kind numbers.
For real
numbers they use the kind number 1 to indicate single (7 significant digits) or
the kind number 2 to indicate double (15 significant digits). Some FORTRAN
compilers may have larger significant digits and another kind number. To
specify a kind of constant, an underscore followed by a kind number is appended
to the constant. Thus 3.14159_1 has a single precision kind and 3.14159_2
has a double precision kind, because it has a _2 after it. (Notice the
underscore in FORTRAN has a different meaning than it has in Perl or
Integer values have a kind 1 for values in the range of 2^7, kind 2 for values in the range of 2^15, kind 3 for 2^31, and maybe kind 4 for 2^63. Thus 123456789_3 has an integer kind number 3. If a kind value is not supported by a compiler it generates a syntax error when compiling the program.
There is a great deal more to this in FORTRAN. We can use the operator :: (two colons) to indicate exact minimum precision needed for both integer and real values. Named constants can be used for kind values. Here are a couple of brief examples:
INTEGER, PARAMETER :: Range18 = SELECTED_INT_KIND(18)
REAL, PARAMETER :: Prec20 = SELECTED_REAL_KIND(20,
40)
First, we need to set up kind indicators. In the above two lines, Range18 can be used to indicate integer kind range of 18 digits, and Prec20 indicates real numbers with at least 20 significant digits with exponents range up to 40. Now we can use these like we did the kind constants 1, 2, or 3.
12345_Range18
3.14159_Prec20
This was a very brief description of FORTRAN kind numbers. If you want more information you will need to find a FORTRAN 90 textbook. These kind numbers are also available for variables.
1. Your PhD thesis is to indicate how to expand C++ or Java so these languages can indicate desired precision for constants or variables. Read the previous section on FORTRAN 90 kind numbers. If your method breaks all previous C++ programs you will not obtain your PhD.
2. Perl and early BASIC does not distinguish between integers and floats. These two languages just have numbers. Do you think this is a good approach? Should we do this in OPL? Why or why not?
1. We have seen several types of numeric literals or constants. What numeric constants do you think we should have in OPL?
2. Do we want to allow for different integer literals? For example, short or long integers? Do we need one, two, three, or more types of integers? And how shall we indicate what is desired when we type an integer?
3. Do we want to allow for different real literals? For example, float, double, or long double. Do we need one, two, three or more types of real literals? How shall we write these different forms in OPL?
4.
What base or radix of integers will we allow: binary, octal,
hexadecimal, others? The C family has one way, FORTRAN 90 has another way, and
5.
Most or all languages do not allow commas in numeric literals,
like 1,234. Is this restriction still necessary? Do you think we should allow
commas in number for OPL? Notice how
We need or want a literal for true and false. These are called Booleans or logicals, depending on the language. Some languages have a reserved or keyword for these values. Booleans are ordinal values and usually false is less than true. The normal operations are and, or, and not.
There are a few decisions and differences for Boolean values. Here are some questions:
All versions of FORTRAN use .TRUE. and .FALSE. for their logical constants. And in FORTRAN they are called logicals instead of Booleans. Since FORTRAN does not have reserved words, FORTRAN uses a period before and after these logical literals to differentiate them from the variables TRUE and FALSE. If we print a FORTRAN Boolean variable, it will print either T or F, and those are what we need for input if reading Boolean data into a program.
The inputting and outputting of logical values is messy in most languages. If we want to use the integer 123, we can use it exactly that way as a literal constant in the program, read in the integer, or print the integer and it is all the same. It is not as simple with logical literals.
Some languages do not have a nice way to input or output Boolean values. For example, in FORTRAN, logicals print as an F or T. But when we want to read in a value for true, we can use the letter t, or period and t (i.e., .t), or period and the word true (i.e., .true), or any string that starts with the letter t, or a period, letter t, then anything. So input, output, and inside the program are all potentially different for FORTRAN logicals.
The C family of languages does not use named constants for logical values. Instead they use 0 (zero) for false and 1 (one) for true as the result of relational or logical expressions. Thus if we tried to print 4< 3 we would get a zero, and 4< 3 would get us a 1.
cout << "true="
<< 3<4 << endl;
cout << "false="
<< 4<3 << endl;
While
this works in C++, something similar could be done in other languages to see if
and what it prints. The situation is a little more complicated since a value of
zero is equivalent to false, and any other value is equivalent to true. So
if (x) . . .
will be false, when x is equal to zero and false otherwise. While this can be a blessing when we know what we are doing, it is also a common source of bad program errors when we are not careful. Java broke away from its