Application of a novel TabPFN model for tabular data classification
DOI:
https://doi.org/10.15276/ict.02.2025.07Keywords:
Tabular data, machine learning, classification, regression, gradient boosting decision trees, generative transformer model, in-context learning, two-way attention mechanismAbstract
As tabular data remains the most commonly used form of data—ubiquitous across numerous fields such as medicine, finance, manufacturing, economics, public governance, and climate science—the problem of developing new methods for the classification and regression analysis of tabular datasets remains highly relevant. Although deep learning has revolutionized learning from raw data in domains like computer vision and natural language processing, tabular data presents a unique set of challenges that prevent conventional neural network–based models from being immediately effective. In our study, we examine the novel TabPFN v2 (Tabular Prior-Data Fitted Network) model developed by Prior Labs, which promises highly accurate predictions on small- to medium-sized datasets without extensive tuning or data preprocessing. TabPFN is a generative, transformer-based foundation model that leverages the same mechanisms that have driven the remarkable success of large language models to produce a powerful tabular prediction algorithm. It is pre-trained on a large corpus of diverse synthetic tabular datasets and employs in-context learning with a bidirectional attention mechanism to address key limitations of existing deep learning models when analyzing row–column– structured data. Applying TabPFN to a real-world task of classifying supply records for risk assessment, we found that, when used within its specified limits, this model can outperform established state-of-the-art gradient-boosted decision tree models. We also explored the optimization options available in TabPFN and conducted experiments using our real-world data. Overall TabPFN is a powerful example of how transformer model principles can be adapted to row-column organized data. While not being a one-sizefits-all solution, TabPFN is certainly worth including in the toolkit for tabular data analysis.