pdk.DataClasses.TableClasses ($Date: 2002/12/11 09:38:43 $)
index
pdk/DataClasses/TableClasses.py

Versatile table data classes.

Like "real" (relational) data tables, Table instances have columns with a name and a type.

Creating a table is easiest with the new_table factory:

t = new_table(["FIRST","SECOND","THIRD"],
              data = [[1,2,3.],[4,5,6.]],
              columnTypeCodes = ["i","i","f"],
              columnLabels = ["first column","second column"])

To get at the column data, do

columndata = t["FIRST"]

where columndata will be a NumPy array of typecode "i", since this is the type code of the column.

Accessing row data works like this:

rowdata = t[ 0 ]

where rowdata will be a Record instance, which is essentially a single-row table, i.e., it also features column data access by name:

value = rowdata["FIRST"]

will make value have the value 1.

Available column typecodes:

Tables provide a number of services:

  • selecting records with arbitrary callbacks
  • indexing with a single/multiple columns or an arbitrary expression
  • sorting
  • joining
  • complex sub-setting (e.g., a[("FIRST","THIRD")] returns a new table with only the "FIRST" and "THIRD" columns)
  • pickling
  • exporting to/importing from delimited ASCII files (using the export_table and import_table standalone functions).
  • grouping and aggregating of table records

FOG 01.2000,08.2002

 
Classes
            
GroupedTable
Table
Record
TableFactory
TableIndex
__builtin__.object
TableColumn
TableStructure
pdk.ErrorClasses.pdkError(exceptions.StandardError)
TableError
TableFactoryError
 
class GroupedTable
     
Purpose:splits a table into groups defined by the levels of one or several ordinal columns

 
   Methods defined here:
__getitem__(self, level)
__init__(self, table, groupKey)
Parameters:
  • table: Table instance to be grouped
  • groupKey: a single column key or a tuple of column keys that define the groups via the Table.levels method or a callable that accepts a record and returns the level for this record or None indicating that this record should be excluded from the groups

Note that the groups are formed by appending each record in the input table table to exactly one (freshly instantiated) group table. The number of group tables is defined by the number of levels induced by groupKey.

__len__(self)
aggregate(self, aggregationKey, statistics=<function sum>, column=None)

aggregates the groups in this table group using the common column key aggregationKey. For each group, aggregationKey is passed to the Table.__getitem__ method of the corresponding Table instance, and the result is passed to the statistics callable.

Value:a dictionary :: { <level> : <aggregated quantity> }
compress(self)
getGroupKey(self)
returns the group key underlying the table groups (see constructor for possible values).
groupSizes(self)
returns a dictionary ::

{ <level> : <group size> }

for all groups in this grouped table.

groups(self)
returns a dictionary ::

{ <level> : <group table> }

where <group table> is a Table instance comprising all records of the input table that belonged to the group defined by <level>.

levels(self)
returns the list of levels for the grouped table.
 
class Record(Table)
     
Purpose:container class for data from a single record of an Table
Detail:keeps most of the interface of a full table, but restricts data access to a single row of data.

 
   Methods defined here:
__contains__(self, other)
implements containment. Returns true if the values for all column keys in self that are also in other are equal. Note that this does not include the case where other defines additional keys not defined by self. This way, the only presumption made about other that it supports sequential access.
__eq__(self, other)
implements an equality test. Two records compare equal if all their fields compare equal.
__getitem__(self, key)
__init__(self, data, structure, **optionD)
__setitem__(self, key, value)
items(self)
keys = getColumnKeys(self)
update(self, *recordItemTT, **recordDataD)
values(self)
 
class Table
     
Purpose:a record-oriented data table (or "data base") with variable column data types
Detail:

Adds record-based operations to the base class.

Also adds row labels, which are perceived as cases. See the new_table factory function for flexible creation of tables.

Tables may contain different data types in different columns (the behavior adopted here is very much derived from S-plus frames). The data are kept as NumPy object arrays and converted to the corresponding column type only when necessary (and if possible).

The following column type codes are recognized:
  • "i": Integer
  • "f": for Float
  • "c": character (string)
  • "b": boolean
  • "D": date/time
ToDo:make skip() honor the current index (i.e., have the sequence of records be defined by the current index)

 
   Methods defined here:
__add__(self, other)
__array__(self)
__delitem__(self, key)
deletes the row (key is an integer) or column (key is a string) specified by key.
__eq__(self, other)
implements an equality test. Two tables are equal if all their records compare equal.
__getitem__(self, key)
returns the record (key is an integer), column (key is a string), element (key is a 2-tuple), or sub-table (key is a slice, a tuple, or a list) specified by key. If key is a slice, it will retrieve along rows (integer values for start and stop) or along columns (string values for start and stop). If key is a list, the return value will be a sub-table made up from the specified rows (list of integers) or columns (list of strings). If key is a tuple of strings, this will also return a sub-table made up from the specified columns.
__getstate__(self)
__iadd__(self, other)
__init__(self, data, structure, dateTimeFormat=None, rowLabels=None, rowLabelPrefix=None, indices=None, **infoD)
__len__(self)
__setitem__(self, key, value)
replace the row (key is an integer) or column (key is a string) or field (key is a 2-tuple (<integer>,<integer>) or (<integer>,<string>)) specified by key with value. If key is an integer, value is expected to have an .items() method that returns <column key> : <column value> pairs. If key is a string, value is expected to be a sequence object of the same length as the table.
__setstate__(self, slotValueD)
__str__(self)
append(self, recordData=None, rowLabel=None)
append the data in recordData at the end of the table. See the insert method for details.
buildIndices(self)
(re-)builds all indices associcated with this table.
copy(self)
returns a copy of the table.
copyStructure(self)
returns an empty table with the same (column) structure and the same indices as self.
display(self, scope=1, key=None)
display scope records starting with the current record. If key is given, show only the data for this column.
eof(self)
extend(self, other, inPlace=1, addIndices=0)
extend this table with the columns of the table other. If inPlace is true, columns of other are added to the current table; otherwise, a copy of the extended table is returned. If addIndices is true, all indices defined on other are added as well. Note that the row labels are not updated or checked for inconsistencies.
find(self, indexName, indexValue, exact=1)
returns the index of the record for which the index specified by indexName has the value given in indexValue. Raises an IndexError if no such record exists and a KeyError if the specified index does not exist.
getColumnIndex(self, columnKey)
getColumnKey(self, index)
getColumnKeys(self)
getColumnLabel(self, columnKeyOrIndex)
getColumnLabelAtIndex = getColumnLabel(self, columnKeyOrIndex)
getColumnLabelPrefix(self)
getColumnLabels(self)
getColumnTypeCode(self, keyOrIndex)
getColumnTypeCodeAtIndex = getColumnTypeCode(self, keyOrIndex)
getColumnTypeCodes(self)
getData(self, copy=0)
getDataAsList(self, copy=0)
getDateTimeFormat(self)
getIndex(self, indexName)
returns the index specified by %indexName%.
getIndices(self)
returns all indices defined for this table.
getInfo(self, item=None)
getNumberColumns(self)
getNumberRows(self)
getPosition(self)
getRowLabel(self, index)
getRowLabelPrefix(self)
getRowLabels(self, key=None)
getShape(self)
getStructure(self, copy=0)
go(self, n=0)
set the record pointer to record number n
hasIndex(self, indexName)
checks for the presence of an index %indexName%.
index(self, indexName, indexObject, maintain=1, persistent=1)
create a table index indexName using the index object indexObject. See the TableIndex class for further information about the parameters.
innerJoin(self, other, column)
perform an inner join with other. See the .join method for further information.
insert(self, position, recordData=None, rowLabel=None)
insert the data in recordData into the table at position position. recordData is a mapping object with a .items() method or None, in which case an empty record is appended.
isEmptyCell(self, row, column)
join(self, other, key, mode=0)

join the table with another table other, using the column named key as key.

mode can be:
  • JOIN_INNER: return only records from self and other that have matching entries for the column specified by key
  • JOIN_LEFT: return all records of self, leaving all columns from other for which no matching entry could be found in column key empty
  • JOIN_RIGHT: return all records of other, leaving all columns from self for which no matching entry could be found in column key empty.

Note that

a1.join(a2, mode = JOIN_RIGHT) == a2.join(a1, mode = JOIN_LEFT)
ToDo:implement JOIN_OUTER mode
levels(self, columnKeyOrKeys, sort=0)
returns a set of the levels of the (ordinal) data specified by columnKeyOrKeys, which may be either a single column key or a tuple of column keys. In the latter case, levels are formed by concatenating the data from the specified columns (after conversion to strings).
pretty(self, printColumnLabels=1, printRowLabels=0, precision=3)
renameColumn(self, oldColumnKey, newColumnKey)
renames the column oldColumnKey to newColumnKey. All indices containing oldColumn explicitly are updated. An error is raised if an index has a callable as value function and might implicitly depend on the name of the old column.
replaceRegexp(self, columnKey, pat, rep)
replaces pattern pat in field columnKey with replacement rep. If rep is a dictionary, it is expected to provide a replacement string for each matched pattern in pat, i.e. it should look like { <pattern name> : <replacement string> }
retrieve(self, n=1, copy=0)
retrieve the next n consecutive records in a new table.
seek(self, indexName, indexValue, exact=1)
returns the record found by calling .find with indexName and indexValue.
select(self, conditionF, key=None)
simple record select operation - returns a table containing only records for which the function conditionF returns true. If key is None, conditionF will be passed each record contained in the table in turn; otherwise, it will be passed only the data from the specified column(s). Note that the returned table will not have any of the indices defined on the source table.
setColumnLabel(self, columnKeyOrIndex, label)
setColumnLabelAtIndex = setColumnLabel(self, columnKeyOrIndex, label)
setColumnLabelPrefix(self, columnLabelPrefix)
setColumnLabels(self, labelL)
setColumnTypeCode(self, keyOrIndex, typeCode)
setColumnTypeCodes(self, typeCodeL)
setDateTimeFormat(self, formatString)
setInfo(self, **infoD)
setRowLabel(self, index, label)
setRowLabels(self, labelL)
skip(self, n=1)
skip the record pointer by n records.
sort(self, column, mode=0)
physically sorts the table by the column column, which can be given as integer, as column label, as a 1-element sequence of a column key or index, or as an arbitrary-length sequence of column keys for multi-column sorting. Note that multi-column sorting is fairly inefficient (uses a temporary TableIndex).
unIndex(self, indexName)
removes the index specified by indexName. Does nothing if no index named indexName is found.

Properties defined here:
current
current getter = getPosition(self)

Data and non-method functions defined here:
DEFAULTDATETIMEFORMAT = '%b %d %Y %H:%M:%S'
DEFAULTROWLABELPREFIX = 'case'
__slots__ = ['_structure', '_data', '_info', '_dateTimeFormat', '_rowLabels', '_rowLabelPrefix', '_indices', '_current', '_last']
 
class TableColumn(__builtin__.object)
     
Purpose:simple container class encapsulating a Table column

 
   Methods defined here:
__getstate__ = getslotvalues(instance)
returns a dictionary of all values for all slots of the Python instance instance.
__init__(self, key, typeCode, label=None)
__setstate__ = setslotvalues(instance, slotValueD)
sets all key:value pairs given in slotValueD as attributes of instance. Checks first if any of the keys in slotValueD is _not_ a slot of instance, in which case a ValueError is raised.
__str__(self)
copy(self)

Data and non-method functions defined here:
__slots__ = ['key', 'typeCode', 'label']
key = <member 'key' of 'TableColumn' objects>
label = <member 'label' of 'TableColumn' objects>
typeCode = <member 'typeCode' of 'TableColumn' objects>
 
class TableError(pdk.ErrorClasses.pdkError)
       
  
Method resolution order:
TableError
pdk.ErrorClasses.pdkError
exceptions.StandardError
exceptions.Exception

Data and non-method functions defined here:
CODES = {'append_inconsistent_type': ('inconsistent data types during append operation encountered', ''), 'end_of_file': ('End of file encountered during operation!', ''), 'extend_inconsistent_shape': ('extension of a table only works if both tables have the same number of rows', ''), 'extend_inconsistent_types': ('cannot extend a table with an object that is not a table', ''), 'invalid_key_list': ('invalid column key list found', ''), 'key_not_found': ('column key not found', ''), 'malformed_index': ('invalid table index', ''), 'value_conversion_failed': ('could not convert values to the specified column types', '')}
DOMAIN = 'Table'
 
class TableFactoryError(pdk.ErrorClasses.pdkError)
       
  
Method resolution order:
TableFactoryError
pdk.ErrorClasses.pdkError
exceptions.StandardError
exceptions.Exception

Data and non-method functions defined here:
CODES = {'init_invalid_columnkeys': ('Error during initialization: invalid column keys.', ''), 'init_invalid_typecode': ('Error during initialization: invalid typecode information', ''), 'init_record_invalid_options': ('Error during initialization: options passed that are only valid for a table', ''), 'init_sequence_conversion_shape': ('Error during initialization: irregular shape of input data sequence', ''), 'unknown_input_type': ('Unknown type of input data', '')}
DOMAIN = 'Table Factory'
 
class TableIndex
     
Purpose:maintains an ordering on the records of a table
Detail:indexObject is either a callable which accepts a record and returns a value, a string referring to a column name or a list/tuple of strings referring to several column names. The maintain flag signifies that this index should be updated whenever record is inserted or deleted into the table. The persistent flag signifies that this index should be stored to disk together with the table data.

 
   Methods defined here:
__array__(self, typeCode=None)
__getitem__(self, index)
__getstate__(self)
__init__(self, indexObject, maintain=1, persistent=1)
__setstate__(self, stateT)
build(self, table)
build the index for the table table.
clone(self)
returns a copy of this index.
find(self, indexValue, exact=1)
find the record index for which the index assumes the value indexValue. If exact is true, raise an IndexError if no such index is found; otherwise, return the nearest record index with a value just bigger than indexValue (or raise an IndexError, if the biggest available value is still smaller than indexValue).
getIndexExpression(self)
returns the expression used to build this index.
getIndices(self)
returns an array of the actual index values for each record in the indexed table.
insert(self, tableIndex, record)
insert a record record at the table position tableIndex.
isMaintained(self)
returns True if this index should be maintained.
isPersistent(self)
returns True if this index should be persistent.
keep(self, keepIndexA)
rebuild the index using only records with indices given in keepIndexA.
setIndexExpression(self, indexExpression)
sets a new index expression for this index. indexExpression can either be a single column key or a list of column keys separated by columns.
 
class TableStructure(__builtin__.object)
     
Purpose:encapsulates the (column) structure of a Table

 
   Methods defined here:
__add__(self, other)
__delitem__ = removeColumn(self, key)
__getitem__ = getColumn(self, keyOrIndex)
__getstate__ = getslotvalues(instance)
returns a dictionary of all values for all slots of the Python instance instance.
__iadd__(self, other)
__init__(self, columns=None, columnLabelPrefix=None)

The columns parameter may be

  1. a sequence of TableColumn instances
  2. a sequence of (<column key>,<TableColumn.__init__ arg tuple>) tuples
  3. a sequence of TableColumn.__init__ arguments columnLabelPrefix is the default column label prefix to be used for autogenerated labels

Note that the sequence of column definitions defines the column indices.

__len__ = getNumberColumns(self)
__setitem__ = addColumn(self, key, item, copy=1)
__setstate__ = setslotvalues(instance, slotValueD)
sets all key:value pairs given in slotValueD as attributes of instance. Checks first if any of the keys in slotValueD is _not_ a slot of instance, in which case a ValueError is raised.
__str__(self)
addColumn(self, key, item, copy=1)
add a new column with key key; item is either an TableColumn instance or a tuple of constructor data for TableColumn.
copy(self)
getColumn(self, keyOrIndex)
access a column by key (string) or index (integer).
getColumnIndex(self, key)
getColumnKey(self, index)
getColumnKeys(self)
getColumnLabel(self, keyOrIndex)
getColumnLabelPrefix(self)
getColumnLabels(self)
getColumnTypeCode(self, keyOrIndex)
getColumns(self)
getNumberColumns(self)
getTypeCodes(self)
removeColumn(self, key)
setColumnKey(self, oldKey, newKey)
setColumnLabel(self, keyOrIndex, label)
setColumnLabelPrefix(self, columnLabelPrefix)
setColumnTypeCode(self, keyOrIndex, typeCode)
setLabels(self, labelL=None)
setTypeCodes(self, typeCodeL)

Properties defined here:
columnLabelPrefix
columnLabelPrefix getter = getColumnLabelPrefix(self)
columnLabelPrefix setter = setColumnLabelPrefix(self, columnLabelPrefix)
keys
keys getter = getColumnKeys(self)
labels
labels getter = getColumnLabels(self)
labels setter = setLabels(self, labelL=None)
numberColumns
numberColumns getter = getNumberColumns(self)
typeCodes
typeCodes getter = getTypeCodes(self)
typeCodes setter = setTypeCodes(self, typeCodeL)

Data and non-method functions defined here:
DEFAULTCOLUMNLABELPREFIX = 'var'
__slots__ = ['_columns', '_columnLabelPrefix']
 
class TableFactory
     
Purpose:factory class for flexible generation of Table instances
Detail:this class has no constructor; use the "new" factory function provided in this module instead.

 
   Methods defined here:
contingencyTable(self, table, columnKeys, statistics=<function sum>, mode='R')

create a contingency table from the Table instance table using levels created from ordinal column data and an optional numeric column.

Parameters:
  • columnKeys : empty OR 3-tuple

    (<col level key>, <row level key>, <cell value key> | None)
    
    • the two ordinal columns serve as column and row labels for the resulting table, respectively
    • the numeric column provides the values for the cells of the resulting table; each cell represents the sum of all values of the numberic column that were associated with a particular row/column value combination. If None is passed as numeric field, the value 1 is assumed instead.

    In the simplest case, specify one ordinal column to form the column levels ("variables"), one ordinal column to form the row levels ("samples"), and optionally one numeric column to supply the values for each unique column/row level combination in the resulting contingency table.

    It is also possible to pass tuples of column keys for the column and/or the row levels arguments, in which case the levels will be formed by concatenating the values from all specifying columns (after conversion to strings).

  • statistics: by default, all values in the same contingency category (i.e, the same column/row level combination) will be summed up (using Numeric.add). Change this behavior by passing a custom function (e.g., pdk.Math.Descriptive.average) here

  • mode: "R" or "Q". "Q" essentially returns an inverted ("variables-by-samples") table

Value:returns a Table instance which has the column levels as column keys, the row levels as row labels, and the result of applying the statistics function to all unique column/row level combinations as cell values.
exportToDelimited(self, table, outFileNameOrStream, fieldDelimiter=None, stringDelimiter=None, writeColumnLabels=1, writeRowLabels=1, fileMode='w', header=None, lineBreak='\n')
converts the data in table to a delimited ASCII-file (record-structured) and writes it to outFileNameOrStream, which is either a file name (for a file to be opened with mode fileMode) or a stream of some sort with a .write() method. The first row of the output contains the field labels (unless writeColumnLabels is false), the first column of each row the row labels, if present (and if writeRowLabels is true). fieldDelimiter will be used to delimit data fields, stringDelimiter to delimit string values.
importFromDelimited(self, source, fieldDelimiter=None, stringDelimiter=None, hasColumnKeys=1, hasRowLabels=0, **newOptionD)
reads raw data (field-structured: FIELD1 FIELD2 ... FIELDn) from an ASCII file or stream into a datatable. Default field delimiter is DEFAULTFIELDDELIMITER, default string delimiter is DEFAULTSTRINGDELIMITER. If hasColumnKeys == True, the first line is assumed to contain the feld (=column) labels. If hasRowLabels == True or the first column label is ROW_LABELS, the first data column is treated as row identifiers. Note that further keyword arguments are passed to the .new method that is used to create the imported table.
new(self, columnKeyL, data=None, columnLabels=None, columnLabelPrefix=None, columnTypeCodes=None, rowLabels=None, convertFromStrings=0, **optionD)

factory function for flexible generation of Table instances.

Parameters:
  • columnKeyL: mandatory (non-empty) list of column keys
  • data: any 2-dimensional sequence type (NumPy array, list of lists, etc.) or None for empty initialization
  • columnLabelPrefix: prefix for column labels (defaults to TableStructure.DEFAULTCOLUMNLABELPREFIX)
  • columnTypeCodes: a sequence of type codes, one for each column (defaults to "c" for all columns)
  • convertFromStrings: if True, the data in data are treated as strings and a conversion to the type code(s) specified by typeCode is attempted
  • optionD: further options for the Table constructor

If initialized empty, the length of the columnKeyL and rowLabels parameters determines the shape of the underlying data array (the number of rows is zero if rowLabels is None).

newRecord(self, columnKeys, data=None, rowLabel=None, **optionD)
returns a new record with the data data (an object with an .``items`` method). See the .new method for the possible options in optionD.
newTable = new(self, columnKeyL, data=None, columnLabels=None, columnLabelPrefix=None, columnTypeCodes=None, rowLabels=None, convertFromStrings=0, **optionD)

Data and non-method functions defined here:
DEFAULTCOLUMNLABELPREFIX = 'var'
DEFAULTDATETIMEFORMAT = '%b %d %Y %H:%M:%S'
DEFAULTFIELDDELIMITER = ','
DEFAULTROWLABELPREFIX = 'case'
DEFAULTSTRINGDELIMITER = "'"
PUNCTUATIONMAP = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f _______________...\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
 
Functions
            
aggregate_table(table, groupKey, aggregationKey, statistics=<function sum>)
convenience function to aggregate a Table instance table grouped by groupKey using aggregationKey and the aggregation statistics statistics. See the GroupedTable documentation for more details.
export_table = exportToDelimited(self, table, outFileNameOrStream, fieldDelimiter=None, stringDelimiter=None, writeColumnLabels=1, writeRowLabels=1, fileMode='w', header=None, lineBreak='\n') method of TableFactory instance
import_table = importFromDelimited(self, source, fieldDelimiter=None, stringDelimiter=None, hasColumnKeys=1, hasRowLabels=0, **newOptionD) method of TableFactory instance
group_table(table, groupKey)
convenience function to obtain a dictionary the groups in table defined by groupKey. See the GroupedTable documentation for more details.
new_record = newRecord(self, columnKeys, data=None, rowLabel=None, **optionD) method of TableFactory instance
new_table = new(self, columnKeyL, data=None, columnLabels=None, columnLabelPrefix=None, columnTypeCodes=None, rowLabels=None, convertFromStrings=0, **optionD) method of TableFactory instance
nones(shapeT)
returns an object array of shape shapeT with all values initialized to None.
 
Data
             JOIN_INNER = 0
JOIN_LEFT = 1
JOIN_OUTER = 3
JOIN_RIGHT = 2
SORTMODE_ASCENDING = 0
SORTMODE_DESCENDING = 1
 
Author
            
$Author: gathmann $