Data within a file often relates to one another in groupings called records. A record is a collection of data, possibly of different types, fixed in number and sequence. Each piece of data within a record is called a field. For example, Alex has 2 dogs and 1 bird so a record to hold this data would look like:
name dogs birds Alex 2 1
The first row of the record is called the headers which provide a name to each field within a record. The headers give meaning to the fields of the record and define the schema, or the structure fields of records are organized into. On the second row is the data relevant to Alex which adheres to the schema defined by the headers which is fixed in length defined by the schema. This means there should be no extra data in the record as records are structured data. ** Structured data** is data that follows a predefined schema and is fixed in length. Unstructured data is data which does not follow a defined schema. Unstructured data is deeper in scope than this article aims to achieve so it will not be discussed in this article.
Files can hold multiple records such as pets.txt:
name dogs birds Alex 2 1 Jane 1 0 Sally 0 2
Which to use in a program typically will need to be read into an array, but arrays can only hold data of a single type and records mostly have fields of different types. Thus records can not be stored in a single array. To store records parallel arrays are used. A parallel array is two or more arrays of the same size used to store records, where each array holds a field of the individual records in the same index across the arrays.
Since parallel arrays must be of the same size it is typical to create them with a constant of the desired size:
const int SIZE = 10; string names[SIZE] = {}; int dogs[SIZE] = {}, birds[SIZE] = {};
Here up to 10 records can be stored across the arrays by storing each field of the records at the same index across each array. So to store Alex’s record at index 0:
names[0] = “Alex”; dogs[0] = 2; birds[0] = 1;
The data for Alex is stored across the same index 0 in each of the arrays so in the future when information on Alex is needed it can be retrieved from the same index in each array. Nine more items can be added to the arrays and if more spaces are needed then the arrays need to be made a larger size.
.txt FileWhen working with multiple records data tends to become very large, thus most records are stored in files. To read the file pets.txt shown in the introduction the file must first be opened into a variable:
ifstream in; in.open(“pets.txt”);
The records can then be read from the file using stream extraction until the file is completely read:
string name = “”; int dog = 0, bird = 0, count = 0; // read entire file while(!in.eof()) { // get record from file in >> name >> dog >> bird; // validate record if (in.fail() || dog < 0 || bird < 0) { if (!in.eof()) { cout << “Error in file\n”; in.clear(); in.ignore(256, ‘\n’); } continue; } // store valid records names[count] = name; dogs[count] = dog; birds[count] = bird; count++; // stop reading once the arrays are full if (count == SIZE) { cout << “Arrays full\n”; break; } } // close file after reading in.close();
The process is very similar to file i/o with simple 1D arrays just with the addition of reading multiple items from the file at once and storing them across multiple arrays. The count is used as the storage index for the current record as the count prior to the read is the index the next item needs to be stored at since arrays are indexed starting at 0 in C++. The count is used across all of the arrays to store the fields of the record across the parallel arrays at a consistent index, and is only incremented once the field is completely stored across each parallel array.
Write a program that reads records of pokemon statistics from a file, then allows the user to enter a pokemon’s name and if the pokemon is stored its stats output. It can be assumed the numerical fields of the records will be valid.
Consider the file pokemon.txt which contains:
pokemon hp attack defense Bulbasaur 45 49 49 Carmander 39 52 43 Squirtle 44 48 65 Pikachu 35 55 40 Mewtwo 106 110 90 Mew 100 100 100
Which first must be opened:
#include <iostream> using namespace std; int main() { // open the file of pokemon ifstream pokemonFile; pokemonFile.open(“pokemon.txt”); return 0; }
Once opened the records of the file can be read field by field from the file to parallel arrays:
#include <iostream> using namespace std; int main() { // open the file of pokemon ifstream pokemonFile; pokemonFile.open(“pokemon.txt”); const int SIZE = 10; string pokemon[SIZE] = {}; int hps[SIZE] = {}, attacks[SIZE] = {}, defenses[SIZE] = {}; // read entire file while(pokemonFile >> pokemon[count] >> hps[count] >> attacks[count] >> defenses[count]) { // increase count count++; // stop reading once the arrays are full if (count == SIZE) { cout << “Arrays full\n”; break; } } // close file after reading in.close(); return 0; }
For this file it is assumed that the numerical fields will be valid, thus the only way reading from the file will fail is from a blank line at the end of the file which will cause the stream extraction to return false. This allows the reading from the file to be used as the loop condition as the false returned will cause the loop to stop executing. Otherwise the fields of the record are saved directly into the parallel arrays, the counter incremented, and the loop execution stopped if the parallel arrays become full.
The pokemon the user wants can then be read in, and then the name of the pokemon searched for in the pokemon array:
#include <iostream> using namespace std; int main() { // open the file of pokemon ifstream pokemonFile; pokemonFile.open(“pokemon.txt”); const int SIZE = 10; string name = “”, pokemon[SIZE] = {}; int hps[SIZE] = {}, attacks[SIZE] = {}, defenses[SIZE] = {}, found = -1; bool found = false; // read entire file while(pokemonFile >> pokemon[count] >> hps[count] >> attacks[count] >> defenses[count]) { // increase count count++; // stop reading once the arrays are full if (count == SIZE) { cout << “Arrays full\n”; break; } } // close file after reading in.close(); // get pokemon to search for cout << “Enter a pokemon to search for: “; cin >> name; // find index of pokemon for (int i = 0; i < count; i++) { // see if current pokemon is one searching for if (pokemon[i] == name) { // output stats and stop looping cout << “Name: ” << pokemon[i] << endl << “ HP: “ << hps[i] << endl << “ Attack: “ << attacks[i] << endl << “ Defense: “ << defenses[i] << endl; found = true; break; } } // output error if the pokemon is not found if (!found) { cout << name << “ is not a pokemon in your pokedex.\n”; } return 0; }
Since the names in the array are all strings without spacing the stream extraction operator can be used to read in the name the user wants to search for. The pokemon array which contains the names of the pokemon is then searched from front to back for the name, and if the name is found at the current index i the loop is on the other information is accessed and output from the other parallel arrays using the same index i. A boolean found is then set to true so the not found message is not displayed, and the loop is terminated early using break. If the name is not found then found remains false causing the final selection to run which outputs the name of the entered pokemon followed by “ is not a pokemon in your pokedex.\n”.
A sample run of this program where the pokemon is found:
Enter a pokemon to search for: Pikachu Name: Pikachu HP: 35 Attack: 55 Defense: 40
And a sample run of this program where the pokemon is not found:
Enter a pokemon to search for: Charizard Charizard is not a pokemon in your pokedex.
Record - A collection of data, possibly of different types, fixed in number and sequence.
Field - Each piece of data within a record.
Headers - Provide a name to each field within a record.
Schema - The structure fields of records are organized into.
Structured Data - Data that follows a predefined schema and is fixed in length.
Unstructured Data - Data which does not follow a defined schema.
Parallel Array - Two or more arrays of the same size used to store records, where each array holds a field of the individual records in the same index across the arrays.
To Be Added Later