LWD DATA AND QUALITY

A complete set of LWD data were recorded in all Leg 171A holes using the Schlumberger-Anadrill compensated dual resistivity (CDR) and compensated density neutron (CDN) tools. Although these tools differ slightly from conventional wireline logging tools, they are based on the same physical principles and results comparable to wireline logging can be obtained. One of the main differences is that the data are not recorded with depth but with time. The downhole data acquisition systems are synchronized with a system on the rig that monitors time and drilling depth. After completion of the drilling, the data are downloaded from memory chips in the tools and the time-depth conversion is made. In contrast to conventional wireline logging data, depth mismatches between different logging runs are impossible because the data are all obtained during a single logging run.

A full description of the principles and measurements performed by the LWD tools is given by Anadrill-Schlumberger (1993) and Shipboard Scientific Party (1998). All Leg 171A holes (1044A, 1045A, 1046A, 1047A, and 1048A) were successfully logged with both the CDR and the CDN tools, and the data are considered to be of overall good quality. This is the most complete and comprehensive data set of in situ geophysical measurements in an accretionary wedge drilled by ODP. Physical and chemical properties measured by the CDR and CDN tools include spectral gamma ray (GR); thorium, uranium, and potassium content (Th, U, and K); computed gamma ray (CGR); formation bulk density (ROMT); photoelectric effect (PEF); differential caliper; attenuation resistivity (ATR); phase shift resistivity (PSR); and neutron porosity (TNPH). Additional parameters of geotechnical significance, such as the rate of penetration and weight on bit, are also collected. The radius of investigation and vertical resolution of LWD logging tools vary depending on the measuring principle and measured property. For example, the PSR curve provides shallow resistivity estimates in comparison to the deeper reading ATR curve. The PSR and ATR measurements are most accurate within low-resistivity formations (<2 m) (Anadrill-Schlumberger, 1993), which is the typical case in accretionary wedges where sediments are unconsolidated and porosities tend to be high (>40%). The TNPH measurement responds not simply to formation porosity, but also to the hydrogen content within the bulk rock. Thus, in clay-rich formations TNPH records the combined effect of porosity and clay content. Chemical elements with large neutron cross sections like gadolinium may have also an effect on the neutron porosity readings. Unfortunately, no gadolinium content measurements were available until now for the Barbados accretionary wedge sediments. TNPH measurements are most accurate in formations with porosities not >40% (Theys, 1991). Porosities in the Barbados accretionary wedge are as high as 70%, resulting in noisy and scattered TNPH data.

Statistical Methods and Theoretical Background

A description of the basic onboard data treatment is given in the Initial Reports volume of Leg 171A (Moore, Klaus, et al., 1998). In this volume, a detailed and expanded procedure of data processing is described and documented. Excellent reviews of general statistical techniques, their use in geosciences, and examples in borehole geophysics are given by Backhaus et al. (1996), Brown (1998), Bucheb and Evans (1994), Davis (1986), Doveton (1994), Elek (1990), Harvey and Lovell (1989), Harvey et al. (1990), Howarth and Sinding-Larsen (1983), and Rider (1996).

Data Preparation

The statistical methods described in this paper require that the observational data set (i.e., the geophysical measurements) be normally distributed. When this is not the case, the observations should be transformed so that they more closely follow a normal distribution. For example, electrical resistivities often appear to follow a lognormal distribution, and application of a logarithmic transform will yield observations that are more normally distributed. Erroneous values, when they can be clearly identified, must also be omitted from the analysis. Fortunately, LWD generally provides large, reliable data sets so that this editing procedure has little negative effect on the analysis.

Finally, before beginning the statistical analysis, the observational data should be rescaled by subtracting the mean and dividing by the standard deviation (i.e., a "standardization" of data). The resulting values will be dimensionless and will have a mean of zero and a standard deviation of 1. This permits comparison between all the observations regardless of their original scaling.

Factor Analysis

Factor analysis (FA) is a technique for examining the interrelationships among a set of observations. It is used to derive a subset of uncorrelated variables called factors that adequately explain the variance observed in the original observational data set (Brown, 1998). Often such analysis reveals structure in the data set by identifying which observations are most strongly correlated. Interpretation of these correlations contributes to understanding of the underlying processes that are being measured. A significant advantage of FA is that the number of variables can be dramatically reduced without losing important information. In other words, the dimensionality of the observational data set can be reduced. Half a dozen or more interrelated variables might be reduced to perhaps two or three factors that account for nearly all the variance in the original data set. Visualization of two or three factors is much simpler than visualization of the entire data set.

When comparing German and U.S. literature, FA is sometimes confused with the principal component analysis (PCA). But there is a significant difference between the two techniques. Strictly speaking, principle components are the eigenvectors of the covariance or correlation matrix of the observations. Statistical considerations such as probability or hypothesis testing are not included in PCA (Davis, 1986). Often though, PCA forms the starting point for FA. In FA, a series of assumptions are made regarding the nature of the parent population from which the samples (i.e., observations) are derived. For example, the observations are assumed to follow a normal distribution. Such assumptions provide the rationale for the operations that are performed and the manner in which the results are interpreted (Davis, 1986).

Another way of explaining the difference between FA and PCA lies in the variance of variables (communality) that is analyzed. Under FA, attempts are made to estimate and eliminate variance caused by error and variance that is unique to each variable (Brown, 1998). The result of FA concentrates on variables with high communality values (Tabachnick and Fidell, 1989); only the variance that each variable shares with other observed variables is available for analysis and interpretation. In this investigation, the FA method is used because error and unique variances only confuse the picture of underlying processes and structures. Factors and factor loadings were calculated from the rescaled logging curves using standard R-mode factor analyses procedures (Davis, 1986) on the variables at each site. A Kaiser Varimax factor rotation (Davis, 1986) is applied because the matrix of factor loadings is often not unique or easily explained. The technique of factor calculation is that of extraction of the eigenvalues and the eigenvectors from the matrix of correlations, or covariances. With appropriate assumptions, the factor model is simply a linear combination of underlying variables and properties. A factor is taken as being significant for an underlying property if it adds a significant amount of variance, or in practical terms, if its eigenvalue is >1. Factors with eigenvalues <1 account for less variation than one of the initial variables.

Theoretically, because they are maximally uncorrelated, each factor represents an underlying rock property such as porosity, lithology, fracture density, water content, or clay type. This is not strictly the case, in reality, because there is obviously no precondition that the rock properties will themselves be uncorrelated. Indeed, it is possible to envision highly nonlinear interrelations between various rock properties like porosity, lithology, fracture density, fluid content, and clay type. As a first order interpretation though, FA provides an objective, rapid, and methodical approach for identifying major features of an observational data set. Also, since many borehole geophysical tools respond primarily to porosity and lithology, Elek (1990) argued that the first two factors (i.e., the two factors accounting for the highest degree of variance in the observations) derived from FA will also relate directly to porosity and lithology. This is a reasonable assertion when the interaction between various rock properties is known to be relatively simple. Such is the case at the Barbados accretionary wedge, where the sediments are unlithified, there is little secondary mineralization, the large-scale porosity trends are predictable, and the fluid content is well known. In many respects, accretionary wedge sediments are somewhat unique when compared, for example, to the typical variation in rock parameters that is encountered in petroleum industry applications.

Generally, for the Leg 171A LWD data sets, far more than 80% of the variance observed in the input variables can be described by the first two or three factors (Tables T1, T2, T3, T4, T5, T6, T7, T8, T9, T10). This means that the amount of explained variance is >80%, although the number of variables has been reduced from as much as 7 to 2 or 3.

Cluster Analysis

After performing FA, statistical electrofacies are defined using cluster analysis. Clustering techniques are generally used for grouping individuals or samples into a priori unknown groups. The objective of the cluster analysis is to separate the groups based on measured characteristics with the aim of maximizing the distance between groups. Hierarchical clustering methods yield a series of successive agglomerations of data points on the basis of successively coarser partitions. One of the most common methods of complete linkage hierarchical clustering is the Ward method (Davis 1986), which is used in this study.

Before applying the cluster analysis, the factor logs that are used as input variables are reduced to a 1-m depth interval using a finite-impulse response, low-pass antialiasing filter to reduce the number of data points. This step, although unnecessary, has two advantages. First, the cluster analysis, in particular when using the complete linkage hierarchical Ward method, is a very time and computer memory-consuming calculation procedure. Reducing the number of data points results in faster calculations. Second, this step was performed to get a cluster-log that does not show too many details (i.e., showing a new cluster every few centimeters). At the resolution shown in the figures, no loss of information is visible, justifying this reduction process. After this data reduction procedure on the factor logs, a complete linkage hierarchical cluster analysis using a Euclidean norm ("Ward method"; see Davis, 1986) was performed on the two or three decimated factors that accounted for the greatest amount of variance in the initial data set. This allowed the identification of statistical electrofacies, or logging units, with distinct combinations of rock physical and chemical properties (e.g., Serra, 1986). A dendrogram, which is a tree diagram showing similarity or connectivity between samples and clusters (e.g. Doveton, 1994) is used to decide how many clusters are significant and useful. For all sites, the number of clusters varies between 4 and 6. Of course, the likelihood for a greater number of significant clusters in deeper boreholes increases as the number of observations increases.

There are several commercial software packages that can be used to perform all the multivariate statistical methods described above. For this investigation we used WINSTAT 3.1 (Kalmia Software) and MVSP 3.0 (Kovach, 1998) on a PC platform under Windows NT 4.0 and 128 MB of RAM.

NEXT