Bookkeeping with awk and MATLAB

Spring 2010

If Socrates had lived in the age of big data, would he be famous for saying "the unquantified life is not worth living"? In the second decade of the 21st century we already find it completely natural to measure with the utmost precision our distances walked, energy taken in through meals, oxygen inhaled through exercise, blood pressure and heart rate, all recorded automatically by the portable computers we insist on calling "phones" (despite almost never using them to conduct voice conversations across great distances) and other wearable devices. In the preceding decade, while software engineers were laying the groundwork for this trend, I had already allowed my personal record-keeping to encroach on such territory as cash transactions, SmarTrip travels, and music listening choices, but only with the primitive tools of pencil and paper. For easier post-processing of the data, I started making these entries in dedicated software programs, including Grisbi and Gnucash for the monetary transactions. When the lag time of X forwarding and the unavailability of such specialized programs on campus computer labs limited the use of such graphical interfaces to the ever-decreasing hours spent at home, I sought out a record-keeping system that would work within the text-based environment of a terminal window.
Implementation details
  • For each account, a tab-separated text file requires the following fields, using a separate line for each transaction (longer comments or unaccommodating text editor widths require you to turn off automatic line feed):
    1. date
    2. expenditure type (four-letter code) or "-" if not a debit
    3. income type (four-letter code) or "-" if not a credit
    4. "-" or some other code for a vestigial classification I no longer make
    5. "Y" or "N" depending on whether the transaction has cleared
    6. description of the transaction
    7. dollar amount
  • The script balance.awk uses awk to process the account file and constructs a summary of the month-by-month totals, as well as the current balances (what the bank knows about, versus what you know about).
  • The script uses python to process the account file, returning a week-by-week summary of spending patterns sorted into various categories.
  • The script octweek.m uses octave to read the python-formatted output (after deleting the header in line 1) and populates a vector of year-to-date totals for each spending category. These time series can be plotted or further processed with the built-in vector capabilities of the MATLAB-like environment.
Transferrable skills learned from this project
  • string operations to extract components of a date
  • conversion between string and numeric data types
  • calendar arithmetic using a look-up table for the number of days in each month
  • flow control: emulating in python the case function native to bash