Quantization is one of the compression techniques to reduce computation cost, memory, and power consumption of deep neural networks (DNNs). In this talk, we will focus on a post-training quantization algorithm, GPFQ, that is based on a deterministic greedy path-following mechanism, and its stochastic variant SGPFQ. In both cases, we rigorously analyze the associated error bounds for quantization and show that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights – i.e., level of over-parametrization. To empirically evaluate the method, we quantize several common DNN architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models.