-
Notifications
You must be signed in to change notification settings - Fork 28
7. Accessing rows and columns
Rows and columns of a data frame can be accessed either by their names or their numeric indexes. You can access row 'C' and the column 'Population' of a data frame created in the previous sections by writing
df row: 'C'.
df column: 'Population'.
Alternatively, you can use numeric indexes. Here is how you can ask a data frame for a third row or a second column:
df rowAt: 3.
df columnAt: 2.
The important feature of a DataFrame
is that when asked for a specific row or column, it responds with a DataSeries
object that preserves the same indexing. This way, if you extract row 'B' from a data frame, it will still remember that 'Dubai' is a city with a population of 2.789 million people
| B
------------+-------
City | Dubai
Population | 2.789
BeenThere | true
You can access multiple columns at a same time by providing an array of column names or indexes, or by specifying the numeric range. For this purpose DataFrame provides messages rows:
, columns:
, rowsAt:
, columnsAt:
, rowsFrom:to:
, and columnsFrom:to:
df columns: #(City BeenThere).
df rowsAt: #(3 1).
df columnsFrom: 2 to: 3.
df rowsFrom: 3 to: 1.
The result will be a data frame with requested rows and columns in a given order. For example, the last line will give you a data frame "flipped upside-down" (with row indexes going in the descending order).
You can change the values of a specific row or column by passing an array or series of the same size to one of the messages: row:put:
, column:put:
, rowAt:put:
, columnAt:put:
. Be careful though, because these messages modify the data frame and may result in the loss of data.
df column: #BeenThere put: #(false true false).
As it was mentioned above, single cell of a data frame can be accessed with at:at:
and at:at:put:
messages
df at: 3 at: 2.
df at: 3 at: 2 put: true.
When working with bigger datasets it's often useful to access only the first or the last 5 rows. This can be done using head
and tail
messages. To see how they work let's load the Housing dataset.
df := DataFrame loadHousing.
This dataset has 489 entries. Printing all these rows in order to understand how this data looks like is unnecessary. On larger datasets it can also be time consuming. To take a quick look on your data, use df head
or df tail
| RM LSTAT PTRATIO MDEV
---+---------------------------------
1 | 6.575 4.98 15.3 504000.0
2 | 6.421 9.14 17.8 453600.0
3 | 7.185 4.03 17.8 728700.0
4 | 6.998 2.94 18.7 701400.0
5 | 7.147 5.33 18.7 760200.0
The resuld will be another data frame. head
and tail
messages are just shortcuts for df rowsFrom: 1 to: 5
and df rowsFrom: (df numberOfRows - 5) to: df numberOfRows.
. But what if you want a different number of rows? You can do that using parametrized messages head:
and tail:
with a given number of rows.
df head: 10.
df tail: 3.
You can also look at the head or tail of a specific column, since all these messages are also supported by DataSeries
(df column: #LSTAT) head: 2.
The result will be another series
| LSTAT
---+-------
1 | 4.98
2 | 9.14