Multiuser detection (MUD) strategies have the potential to significantly increase the capacity of wireless communications systems, but for these to be useful they must also be practical for implementation in VLSI circuits that cope with real world situations and process data in real time. The application of the optimal Maximum Likelihood (ML) solution is limited by its exponential computational complexity. Current near-ML algorithms such as the spherical detector offer significantly reduced complexity, but are still likely to be impractical for larger problems. Simpler MUD algorithms, such as the interference canceller (IC), are more practical but offer poor performance in some cases of interest. In this paper, we present a unique algorithm that shares aspects of these well known techniques, but offers higher performance than the simpler methods, while remaining practical. Our approach involves a series of iterations that that gradually converge to a decision, by choosing the estimates that produce lower values for our ML cost function. While this is more complex than well known methods, it offers higher bit error rate performance and is lower in complexity that other methods with comparable performance. We also show how to further reduce the complexity to make our algorithm suitable for VLSI implementation.