An improved VLSI architecture for a high-speed Viterbi decoder is proposed. We partitioned the datapath of the Viterbi decoder into largely 3 pipeline stages and to reduce the operation overhead of the add-compare-select unit (ACSU), removed the minimum metric selection logic and exploited the constant subtraction scheme for the metric rescaling. This can be done by using unsigned arithmetic and the overflow detection unit. We also discussed the uselessness of the minimum metric selection logic in the analysis of truncation effects. Simulation results demonstrated that if the traceback depth is long enough, the arbitrary state decoding can be used without many disadvantages over the best state decoding. The survival memory unit (SMU) pipelining architecture based on the modified traceback algorithm is also presented. By exploiting the two registers and multiplexers, we made a one-stage pipeline cell and by cascading them, a traceback operation without LIFO or a complex memory controller can be achieved with a latency of only 2T.