Dealing With Huge Data Files
I'm working with some huge data files, often falling in the 6-15 megabyte range. Using VB's native file handling has proven to be excruciatingly slow. Can I load and process the entire file in memory, instead of repeatedly hitting the disk?
A.Yes. In fact, you can do it several ways. You've probably explored the possibility of reading the entire file into a byte array or string with code similar to this:
Public Function GetFileBytes(FileName As _
String, Data() As Byte) As Boolean
Dim hFile As Long
hFile = FreeFile
On Error Resume Next
Open FileName For Binary As #hFile
ReDim Data(0 To LOF(hFile) - 1) As Byte
Get #hFile, , Data
GetFileBytes = (Err.Number = 0)
A routine like this is handy for small files, but real performance problems arise as the size of the target file grows. Also, using a String variable doubles the memory overhead and creates the potential for data corruption due to Unicode conversion issues.
I recently faced a similar challenge: I needed simultaneous access to data in up to a dozen files, each approximately 5 MB. These files contained a standard header, followed by a matrix of double-precision values. I decided to map the files directly into memory, and access their data by calculating pointers to specific offsets within the files, creating what is known as memory-mapped files. Memory-mapped files are simply files whose data has been mapped directly into your process's address space. If you're dealing with a known format, or can derive the file format from an inspection of header or similar fields, this is definitely the way to go. Certain operations in my own matrix utility were up to 20 times faster after mapping the data directly into my process's memory space.
To create a memory-mapped file, call CreateFile to open an existing file, then call CreateFileMapping to create a file-mapping object. Pass the handle returned from CreateFileMapping to MapViewOfFile to map a view of the entire file into your address space. MapViewOfFile returns the base address from which you will later calculate offsets to access your file's data. When you're through accessing the file, call UnmapViewOfFile and CloseHandle to clean up. I've wrapped these procedures in a handy class module (see Listing 1).
Once you've mapped a view of your file, you can read or write any data within the file using the CopyMemory API. To do this, you must have some knowledge of where within the file the data of interest lies, as well as a firm understanding of how your data will be represented in memory. By way of example, many file formats identify themselves with unique characters in a fixed location; ZIP files contain "PK" as the first two characters. This code reads these values before proceeding with processing the rest of the file:
Dim mf As CMapFile
Dim Buffer() As Byte
Set mf = New CMapFile
ReDim Buffer(0 To 1) As Byte
mf.GetRng VarPtr(Buffer(0)), 0, 2
Debug.Print StrConv(Buffer, vbUnicode)
The CMapFile class's GetRng function accepts a pointer to the location where the range of data should be copied. This provides maximum flexibility because you can slam data to any of VB's native data types. CMapFile also provides functions that extract Long and Double values directly.
Don't let the use of pointers intimidate you. They're simply a Long that contains the address of something in memory. CMapFile does the pointer math for you by adding the file offsets passed to the base address it stores. Your main responsibility is to determine the offsets, within your data file(s), of the data you need. Be careful that the offsets you provide are less than the mapped file's size. And be even more careful that you adequately prepare your buffers so they're ready to accept as much data as you request. Failing to take these crucial precautions typically results in immediate access violations.
Q. Solving Error Message Mysteries
I'm new to API programming, and am really floundering when it comes to figuring out why my API calls fail. All the documentation says to call GetLastError to determine why an API call failed, so I'm doing that. But how do I make sense of the error code returned by GetLastError?
A. First, don't waste your time with GetLastError. Microsoft designed VB so it makes a number of API calls before and after each API call you include in your code. Due to this design, the return from GetLastError probably doesn't accurately reflect the return from your call, but instead the return from one of VB's calls. Many VB developers fall into this trap, especially those coming from other languages.
As a workaround to VB's API-calling design, Microsoft added the LastDllError property to the standard Err object. Immediately after making the call you coded, VB itself calls GetLastError and stores the return in LastDllError for your later inspection. Pass the LastDllError value to the FormatMessage API to retrieve a textual error message more suited to human consumption than a raw number (see Listing 2).
FormatMessage, by default, retrieves standard system error message text. It can also retrieve error message text from special DLLs that contain these resources. For example, if you're working with Internet APIs, you can call LoadLibraryEx on wininet.dll before calling FormatMessage in order to resolve errors from that library.
Challenge to my readers: I've included the starting and ending values for Internet and NT networking error messages in Listing 2. If you're aware of standard libraries other than WinInet and NetMsg that contain similar resources, drop me a line with the filename and message range, and I'll publish an update to this topic in the future.
Karl E. Peterson is a GIS analyst with a regional transportation planning agency and serves as a member of the Visual Basic Programmer's Journal Technical Review and Editorial Advisory Boards. Online, he's a Microsoft MVP and a section leader on several VBPJ forums. Find more of Karl's VB samples at www.mvps.org/vb.