The term BLOB is used in the computer world to describe a data entity that is not of a standard primitive data type (e.g., number, date, time, character string), particularly when applied to field definitions in a relational database. Database suppliers have rationalized the term as an acronym for “Basic Large OBject” or, more commonly, “Binary Large OBject.” In geographic information systems, BLOB database fields are often used to store important data, such as coordinates, terrain elevation or slope matrices, or images.
The original relational databases were designed in the early 1970s for commercial and accounting data, using standard data types to handle entities such as counts (integers), money (fixed-point numbers), dates, and text, all of which were fixed-length fields. The contents of a field could always be loaded directly into the limited computer memory available at the time. The Structured Query Language (SQL) that underpins relational database architecture allowed for direct creation, manipulation, and analysis of these standard data types.
ADVERTISEMENTS:
As database usage spread from the commercial into the scientific and technical world, there grew a eed to hold data that were too big to be handled in this way or were not of an existing data type. So, the database software suppliers invented new flexible data ypes that could hold large amounts of unstructured data and be loaded into memory in sections. One of the first of these was the segmented string data type of the RDB database from Digital Equipment Corporation (DEC). Other suppliers used different names, but in the computer industry, they were often informally and collectively referred to as “BLOB” fields (the choice of term influenced by the cult status of the 1958 film The Blob). The Apollo database system was one of the first to document “BLOB” as an acronym, as “Basic Large OBject,” but subsequent market leaders, including Informix, Oracle, and Microsoft SQL Server, established the acronym in common usage today, for “Binary Large OBject.”
While these BLOB fields allowed handling of new kinds of data (such as GIS data), initially SQL could not see inside them, so analysis and modification were possible only using dedicated applications (such as some commercial GIS). Subsequently, as object orientation became prevalent in programming, several database software suppliers (such as Oracle and Informix) developed object extension mechanisms that provide the best of both worlds—they can store arbitrary data in underlying BLOB fields but still provide access through SQL. Using these extension mechanisms, there have been new data types defined specifically for geographic informationthat meet the industry standard “simple features” specification of the Open Geospatial Consortium (OGC).
So, why is geographic information such as polygon coordinates now more commonly stored in BLOB fields than in the earlier normalized relational form using columns of simple numbers? The usual answer is that it can be stored in that way, but then cannot be retrieved efficiently enough for common GIS operations such as screen map drawing. This is because the relational storage model has no implicit sequence of rows in a table, and hence getting the coordinates into memory and in the right order requires an index lookup and read for each vertex, and a GIS feature like the coastline of Norway may have many thousands of vertices. In contrast, if the polygon coordinates are stored in the database as an array within a BLOB field, then they can be retrieved in a single read access to the database (in the same order as they were stored) into a memory structure that the GIS application can use directly for fast drawing or analysis.