Determining
the Lower- and Upper-Bound of an Array
How
Many Elements Does 10 Get Us?
Array
Operations===where does this go??
Design
Issues for Array Slices
Perl
Shift and Push for Arrays
This work is licensed under a
Creative Commons Attribution-No Derivative Works 3.0 United States License.
Simple variables are called elementary data items. They are manipulated as a unit. For example here are some elementary or simple variables:
int count
= 0;
float rate = 15.50;
But often we want groups or aggregations of elementary data objects. Examples are arrays or records. Objects such as arrays or records are called data structures. They are an aggregate of other data objects. In this chapter, I go over arrays. There are so many ways to handle arrays in the different programming languages, that we could devote a whole book to just arrays. But this chapter will provide a fairly good review on the many programming techniques used with arrays and will point the reader to languages that can be studied for those that want more than what is provided here.
Arrays allow us to represent many values with one variable name. In some languages (e.g. COBOL) arrays are called tables. An individual value in an array is called an element. A subscript or index is used to indicate which one of the array elements we want to use.
While arrays work about the same in most languages there are some interesting differences. Here are some design issues for arrays.
There are arrays in some language that has each of the above choices.
Most descriptions of arrays would indicate that an array is group of homogenous type elements, that is the same type (like all integer or all double). But then scripting languages such as JavaScript, Perl, and Ruby showed up with arrays that can have elements of any type, including other arrays!
Some languages do great things with arrays. If you are
interested in what a language can do with arrays, you might look at APL, which
has an extensive set of array operations. But APL needed many special
characters for operations and thus needed special input and output devices,
which probably doomed the language. Other languages that use regular keyboards
but still do wonderful things with arrays are FORTRAN 77 (and later versions),
PL/I, and
One interesting challenge is how do we tell arrays from other things like functions. This is sometimes called name space for arrays. ALGOL 60 contributed the square bracket [ ] for enclosing array subscripts. Since FORTRAN was already available the people designing ALGOL had probably run into the problem of using round parentheses for both functions sqrt(85.0) and arrays taxrate(4). For example, if we have the following FORTRAN statement:
z = 2 * baz(z)
Is baz an array or a function? In FORTRAN baz could be either an array or a function since both use round parentheses. This makes syntax checking for the FORTRAN compiler difficult and error messages often misleading. This problem is covered in more detail in the Parentheses section of chapter 1.
The second question with using arrays, is how do we tell arrays from simple variables? Can we have an array name and a variable name that are the same in the program? Thus can we have an array named rates[] and a variable names rates? Each language has its own rules about this. But just reading a programming textbook will often give few hints. Try using identical array and variable names in the same program and see what happens. You can make some sophisticated guesses on what will happen. For example, languages that have array operations will make it impossible to have identical names for arrays and variables. For example, some languages (e.g., PL/I) allow this pseudocode for the array rates:
rates
= 0
In these languages all elements of the rates array are set to zero. But this would mean we can not have a simple variable by the name of rates also. Array operations are covered later on in this chapter.
The third array notation question is how to indicate multiple-dimensioned arrays. In FORTRAN and other languages, we use:
dimension scores(5,4)
where array dimensions are listed in one set of parentheses. But the C family of languages use:
int
scores[5][4];
where each array dimension is in a separate set of brackets. Table x.1 illustrates some two dimensional array examples by language.
|
Notation |
Languages |
|
rates(2, 3) |
FORTRAN, BASIC, COBOL, |
|
rates[2][3] |
Java, C++, C#, Modula-2 |
|
rates[2, 3] |
C#, Pascal, Modula-2 |
Array Notation by Language
Table x.1
As you can see, most older languages use parentheses for their arrays (the early input devices did not have brackets), but newer languages use brackets. Thus the newer languages can clearly indicate that zap(x) is a function, and zap[x] is an array. C# and Pascal use only one set of brackets, (i.e., rates[2, 3]), for a multidimensional array, while Java and C++ use a set of brackets, (i.e., rates[2][3]), for each dimension of the array. The situation where C# uses two sets of brackets is a special situation, which is covered later in this chapter when ragged arrays are discussed. Modula-2 allows either [i, k] or [i][k] for array indexes.
The declaring of arrays is similar in many languages but there are a few variations. To illustrate the differences between languages we will declare the same arrays in several languages. First, we declare a 1-dimensional integer array maxy with 10 elements. Next, we declare a floating point 2-dimensional array total with 3 rows and 5 columns. Finally, if possible in that language, we declare a floating point array tides with indexes from 5 to 5. Here are some examples by language.
Java/C++/C:
int
maxy [9];
//arrays start at zero.
float total [2][4];
type maxy is array (1..10) of integer;
type total is array (1..3, 1..5) of
float;
type tides is array (-5..5) of float;
ALGOL:
integer array maxy [
real array total [1:3, 1:5];
real array tides [-5:5];
C#:
int[] maxy = new int[9]; //arrays start at zero.
float[,] total = new float[2,4];
COBOL:
05 MAXY
PICTURE 9999 OCCURS 10 TIMES.
05 ROWS-TOTAL OCCURS
3 TIMES.
10 TOTAL PICTURE
9999V99 OCCURS 5 TIMES.
FORTRAN IV:
DIMENSION MAXY(10),
TOTAL(3,5) !arrays start at one.
REAL, DIMENSION (-5:5)
:: TIDES ! FORTRAN 77
Pascal, Moldula-3 similar):
var
maxy = array
[1..10] of integer;
total = array [1..3, 1..5] of real;
tides = array [-5..5] of real;
As you can see the array declarations are slightly different by language. The numbers used for the array bounds change depending on whether the array starts at zero or one. Some of the languages can handle arrays that use indexes of any starting value, like used for the array tides. Modern languages like ALGOL-60 (joke) can do tricky things like that. It is interesting that few modern programming languages can start an array at any place except zero.
How array size is indicated has several variations. First, can we use variables or just constants for array size?
dim taxrate(100) or dim taxrate(size)
Some languages, such as early FORTRAN, limit array declarative sizes to just integer literals. But even in FORTRAN we can pass a variable to a function or subroutine and then use the variable for the array size:
FUNCTION SUM(TAXRATE,
SIZE)
DIM TAXRATE(SIZE)
While array bounds are often restricted to an integer or integer expression, a few languages are more generous. For example, in BASIC, we can use:
dim tax(11.7)
and the array size 11.7 will be rounded to 12. I am not clear why we need this but I am sure many people can tell me. Other languages allow real numbers to be used but truncate the value to an integer.
The C family allows variables to be used if the variable has a value before execution of that function. In C we could use #define to do what we need.
#define SIZE 100
main() {
int
taxrate(SIZE);
int
otherrates(SIZE * 2);
Not only can we use the integer constant SIZE, but we an also use integer expressions (SIZE * 2) to indicate the size of the array. In C++ we can change the #define line to:
const int SIZE=100;
and the rest would be the same.
Mid versions of BASIC allowed any starting and ending value. So we could have arrays like the following:
dim years (1998 to 2008)
dim tide.level(-
Many modern languages do not allow this freedom with declaring arrays, instead making the programmers deal with array subscript offsets (often unsuccessfully) to take care of a very simple situation.
Since BASIC allows programmers to indicate both the lower
and upper array bound [ DIM TIDE
(-
DIM TIDE (-
Lbound(TIDE, 1) returns 5, and Ubound(TIDE, 1) returns 3. Many languages including C#, Perl, and Python have methods or functions to determine the length of the array.
FORTRAN was one of the first high-level language to have arrays. Since FORTRAN is for scientific programmers, the arrays were organized a little differently. For example, FORTRAN arrays are organized by column (column major order) instead of by row. This means the first subscript increases the most rapidly, followed by the second subscript, and so on. For example:
DIMENSION K(4, 3)
declares an array of 12 elements, and the order in memory will be K(1,1), K(2,1), K(3,1), K(4,1), K(1,2), ....
For many situations it may not matter how the array is stored in memory, but it does matter for operations that transmit entire arrays. One example is input and output, which FORTRAN can handle without specifying the elements:
READ *, K
The above will read the next 12 items of data into the array K. If we assume the input data is the first 12 integers then the array will look like this when full:
1 5 9
2 6 10
3 7 11
4 8 12
There are other operations such as use of DATA statement to fill an array where similar results will occur. FORTRAN was one of the few languages to store arrays by column.
The next historical language, COBOL stores arrays by row (row major order). COBOL and most other languages store elements by row. So if we read in the first 12 integers, the array would look like this in most other languages:
1
2 3
4
5 6
7
8 9
10 11
12
Today most languages fill and process arrays by row instead of columns. But for the early mathematical programmers multi-dimension arrays were just multiple vectors and that arrangement seemed correct. FORTRAN DO loops can be used to change the order of array processing. Business applications treat arrays like records, and thus like row major order. But scientists often need columns of numbers and then column major order is more useful. It is interesting that all modern languages have followed the COBOL order instead of the FORTRAN order.
One unanswered or at least not agreed on characteristic for arrays is where to start them. Should arrays start with element zero or element one? If you ask programmers this question you can probably figure out what language they started programming in. Next, if we start arrays at zero and ask for 5 elements, do we get elements 0-5 or 0-4? The next sections will survey how different languages have answered these questions.
In FORTRAN II and IV arrays automatically start with element 1, and cannot start with a lower or higher value. FORTRAN 77 changed this, allowing negative or positive array starting positions. Thus in FORTRAN 77 and later versions we can declare arrays like the following:
integer years(1998:2008)
real tides (-3:5)
integer gamepoints
(-5:5, 2:5)
So the subscripts of the arrays years go from 1998 to 2008, the subscripts of the array tides goes from 3 to 5 (-3, -2, -1, 0, 1, 2, 3, 4, 5), and gamepoints is a two-dimensional array with negative values on the first index.
years:
array (1998..2008) of integer;
Visual Basic grants the same wide variety in declaring arrays.
Modula-2 and its related languages allow array indexes to start at any value (positive or negative) and allow other cardinal values for subscripts. Here are a few examples:
TYPE
TideLevel = ARRAY [-10..10]
OF REAL;
vector = ARRAY [1..50] OF REAL;
lcaseCount
= ARRAY [a..z] OF INTEGER;
TruthType = ARRAY BOOLEAN OF INTEGER;
Most of these are understandable, except maybe the last line that uses FALSE and TRUE for the subscript.
As you can see by the above there is common question of where arrays start, zero or one. ALGOL got around that problem by using a bound pair for declaring the arrays (see above) and thus the beginning array subscript is indicated.
Why do C arrays start at zero instead of one? The address of the array is the beginning of the array. Thus the array address points to the first (zero) element of the array. So for the array rates, all the following point to the zero element (in C) of the array:
*rates
*(rates+0)
rate[0]
After C started their arrays this way most modern languages adopted the same system of starting all arrays at the zero element.
In early BASIC, arrays of size ten or less elements did not have to declared. The subscripts went from 1 to 10. In the second edition of BASIC, the zero subscript was allowed, so now we could use A(0) or B(0,0). The other interesting item is that if you declare an array
DIM X(20)
you actually have elements X(0) to X(20) available (21 elements). Likewise, an array of B(4,5) goes from B(0,0) to B(4,5) which gets you 30 elements instead of the 20 you get with Java or C. In C when we ask for int x(20) we start at element 0 but only get to element 19, which I have never liked. Too bad Kernighan and Ritchie didnt read BASIC manuals more, or perhaps they did and rejected my suggestion. They are a lot smarter than me, so I assume they did the right thing. One can easily argue that either approach is best. That is, a declaration of x(20) will get either elements 0-19 or 0-20.
Since BASIC was designed for amateur programmers, the interpreter checked for array bound errors. Not many early languages do this for us, probably because this error checking would be a lot more difficult for a compiler than an interpreter. Nowadays computers are very fast so you see C# and VB checking array indexes.
There are some problems here, if the language does array processing. For example, if we have a command that prints arrays do we print the zero element (row) or not?
QBASIC and Visual BASIC 6.0 have solved this problem by letting you set the default to be zero or 1. But then all the arrays in that program start at zero or one in that language. To do this in these versions of BASIC we use the statement:
option base 0
The only choices for option base are zero or one. Since arrays can have a starting and ending value in newer version of BASIC, individual arrays can override this use of option base.
Since we cannot agree on where arrays start (zero, one, or elsewhere), it is not surprising the end of the array is also not agreed on. In FORTRAN and BASIC an array declared of size 10, ends with the 10th subscript. While in the C family of languages the array declared of size 10, ends with the 9th subscript.
In FORTRAN (elements 1-10) and C (elements 0-9), we get 10 elements. While in BASIC (elements 0-10) we now get 11 elements. I have always thought the BASIC solution was best, since I do not have to use the zero element, if not interested, and I did not have to ask for 11 elements (like in C) if I want the element with a subscript of 10. It is interesting how many people feel strongly about either solution.
Thus when we declare an array of 10 elements, we get a different starting subscript (0 or 1), and a different number of elements, depending on the language. The following table illustrates this in different languages:
|
Language |
Element range |
# of elements |
|
FORTRAN, PL/I |
1-10 |
10 |
|
C++/Java/C# |
0-9 |
10 |
|
VB .Net/QBasic |
0-10 |
11 |
Array Elements by Language
Table x.x
The above table shows some of the variations and there are many languages in each category.
Once we have arrays, we need some way to initialize the array elements. C/C++ automatically initializes external and static array elements to zero, but does not initialize local arrays, much to the dismay of many programming students. C# initializes all arrays when they are declared. Most languages do not initialize array elements to any value and have no easy way to initialize large number of elements to the same value.
For example, suppose we have an array KZAP of 10 elements and want to initialize the first three elements to zero, the next four to 9, and the last two elements to 3. In FORTRAN we would do the following:
DIMENSION KZAP(10)
DATA KZAP /3*0, 4*9, 2*3/
Many a time have I wanted some way to do something similar in C++. In Java or C++ we would do the following:
int
kzap[9] = {0,0,0,9,9,9,9,3,3};
which is not too bad as long as I do not need 100 elements instead of 10 elements.
PL/I improved on the FORTRAN method as follows:
DECLARE KZAP(10)
FLOAT DECIMAL
INITITAL((3)0, (4)9, (2)3);
which will initialize the array to the same values as the previous two languages ( 3 zeros, 4 nines, and 2 threes. This is an improvement because the array declaration and initialization is done in one statement rather than two. FORTRAN needs both a DIMENSION and a DATA statement. Both PL/I and FORTRAN have ways to repeat values in other declarations. This seems like a very nice method of initializing array elements.
Python allows us to repeat values for initializing arrays but their syntax is a little different. If we want an array named numbers with the first eight integers we could do this in Python:
numbers
= [1,2,3,4,5,6,7,8]
But if we wanted eight zeros instead we could do this:
numbers
= [0]*8
Likewise, we obtain an array with values similar to the previous PL/I example as follows:
numbers
= [0]*3 + [9]*4 + [3]*2
Thus all these languages have easy ways to initialize elements in an array to multiple copies of the same value.
We can initialize only part of the array in most languages as follows:
int
kzap[9] = {0,,2,,4,,6,,8,};
The above line would initialize the even-number elements, but leave the odd-number elements not initialized. While this may be the intention, the more likely occurrence is an extra comma being a mistake and then one element left out as follows:
int
kzap[9] = {0,1,2,3,4,,6,7,8,9};
In this code, element five was not initialized because of
the extra comma.
type codes is array (1..5) of integer;
codes
:= (4, 1, 6, 3, 99); --positional association.
codes
:= (1..5 =>4); --sets all element
to 4.
codes
:= (1|3|5 =>9, others => 7);
--uses an or.
codes
:= (1..3 =>x, others => z);
--use variables.
codes
:= (4=>99, 2=>7, 3=>88, 1=>72, 5=>78);
--above uses subscript to indicate value.
The composite symbol => is assignment. For you few students not
programming (joke) in
Trying to keep compatible with previous versions of BASIC and Visual Basic hampers Visual Basic. It takes two steps to declare and to initialize the array elements to non-zero values. For example, we do this in Visual Basic 6.0:
Dim intSums(5) as Integer declares 6 elements and
initializes them to zero.
Dim intAges() as Integer = {0, 5, 12, 18, 45, 49}
In Visual Basic 6.0 we cannot declare the upper size of the array and initialize the values in the same statement. Thus the parentheses in intAges() is left empty. Some languages do the opposite, but then you can have conflicts with the elements asked for and the values supplied.
In Perl the @ symbol indicates we are using an array. Perl allows us to use the range operator to initialize arrays. For example:
@numbs = (1, 2, 3,
4, 5)
will set the first five elements of array @numbs to the digits 1 to 5. But we can use the range operator to do it a little easier, as follows:
@numbs = (1..5)
This range operator is available in some other languages, but in Perl, the range operator can be used with simple or complicated strings too. For example:
@letters = (d,
e, f, g, h);
can be change to
@letters = (d .. h);
Ranges do not have to be as simple as above, since we can do things like
part01
.. part05
which get us part01,
part02, part03, part04, part05
Or even things like aa .. cz, which I will let you figure out.[1]
When initializing Perl arrays, expressions can be used. Here is an example:
@x = (4, 2+9,
3*4));
But the values will be flatten before initializating the array, as follows:
@x = (4, 11,
12));
Any one interested in how arrays can be handled in a programming language needs to look at FORTRAN 90. Older versions of FORTRAN also did many amazing things but the newer version added even more. First, there are several different ways to declare an array. Suppose we want an integer array of 10 elements.
FORTRAN IV
DIMENSION X(10),
MINX(10)