Xiaowei Xu
Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, D-81730 München,
Germany. Xiaowei.Xu@mchp.siemens.de
Jochen Jäger
Institute for Computer Science, University of Munich, Oettingenstr.
67, D-80538 München, Germany.
jaeger@informatik.uni-muenchen.de
Hans-Peter Kriegel
Institute for Computer Science, University of Munich, Oettingenstr.
67, D-80538 München, Germany.
kriegel@informatik.uni-muenchen.de
Abstract
The clustering algorithm DBSCAN relies on a density-based notion of
clusters and is designed to discover clusters of arbitrary shape as well
as
to distinguish noise. In this paper, we present PDBSCAN, a parallel
version of this algorithm. We use the ‘shared-nothing’ architecture with
multiple computers interconnected through a network. A fundamental
component of a shared-nothing system is its distributed data structure.
We introduce the dR*-tree, a distributed spatial index structure in
which the data is spread among multiple computers and the indexes of the
data are replicated on every computer. We implemented our method using
a number of workstations connected via Ethernet (10 Mbit). A
performance evaluation shows that PDBSCAN offers nearly linear speedup
and has excellent scaleup and sizeup behavior.
Keywords
clustering algorithms, parallel algorithms, distributed algorithms,
scalable data mining, distributed index structures, spatial databases
Article ID: 241179