For the investigation of many geometrical features of soils, computer-assisted image analysis has become a method of choice over the last few decades. This analysis involves numerous steps, regarding which subjective decisions have to be made by the individuals conducting the research. This is particularly the case with the thresholding step, required to transform the original (color or greyscale) images into the type of binary representation (e.g., pores in white, solids in black) needed for fractal analysis or simulation with Lattice–Boltzmann models. Limited information exists at present on whether different observers, analyzing the same soil, would be likely to obtain similar results. In this general context, the first objective of the research reported in this article was to determine, through a so-called “round-robin” test, how much variation exists among the outcomes of various image thresholding strategies (including any image pre-treatment deemed appropriate), routinely adopted by soil scientists. Three test images – of a field soil, a soil thin section, and a virtual section through a 3-dimensional CT data set – were thresholded by 13 experts, worldwide. At the same time, variability of the outcomes of a set of automatic thresholding algorithms, applied to portions of the test images, was also investigated. The experimental results obtained illustrate the fact that experts rely on very different approaches to threshold images of soils, and that there is considerable observer influence associated with this thresholding. This observer dependence is not likely to be alleviated by adoption of one of the many existing automatic thresholding algorithms, many of which produce thresholded images that are equally, or even more, variable than those of the experts. These observations suggest that, at this point, analysis of the same image of a soil, be it a simple photograph or 3-dimensional X-ray CT data, by different individuals can lead to very different results, without any assurance that any of them would be even approximately “correct” or best suited to the objective at hand. Different strategies are proposed to cope with this situation, including the use of physical “standards”, adoption of procedures to assess the accuracy of thresholding, benchmarking with physical measurements, or the development of computational methods that do not require binary images.