Skip to content
Dan Kelley edited this page Jan 9, 2017 · 5 revisions

This work is being done in branch dk-1093-large-rdi, not in an older branch called dk-1093. (I am experimenting with the idea of more informative branch names, of the form developerInitials-issueNumber-WordsSeparatedWithHyphens.)

  • 2017 Jan 4. I think things are working now, for blocks where from and to yield a subset that is small enough to fit into R. However, I do not think this is the common use case. When I work with data, I would likely prefer to work with by argument, to get a rough overview of the whole timeseries, before focussing on smaller time intervals. I need to write more C code to handle by in this way, and so I would say the work is only 1/4 done. Remaining tasks:

    1. Handle by better, by filling up an unsigned char array with the results of a series of seek and fread calls.
    2. Handle the case of numeric from and to faster (hand these arguments to the existing C function -- easy peasy).
    3. See whether the present scheme of determining the segment pointers is inefficient. The present code reads the whole file twice: a first pass merely count pointers (for a memory allocation) and the second stores into the allocated memory. Another approach would be to have a growable allocation, so I will try that, now that I have a 6Gb file as a test case. (The worry with growable allocation is that time will be spend copying that memory, especially if the growth factor is small, but that we can still run out of memory, if the growth factor is large.)
  • 2017 Jan 8. I think this is working now. The code now does all of the profile-finding work in C, not in R. (I was really only weaving back and forth with R so that I could interpret times in R ... but then I realized that we can handle that by assuming GMT times, so standard C libraries work fine for constructing time as numeric.) The code has been tested quit thoroughly: the build suite; the local test suite; my private test suite; reproduces old data(adp) properly. I merged dk-1093-large-rdi into dk branch, and have asked Clark to try this out for a while. If things are seen to be okay, we might get a merge into develop within a few days. (I don't want to leave the branch hanging for too long because we have other rdi work to do).