This dataset has been obtained by mapping variations included in the well-known S2648 dataset (Dehouck et al., 2009) on full-length UniProt sequences. Moreover, proteins sharing more 30% sequence identity on 40% alignment coverage with any protein in the S669 test set (Pancotti et al., 2022) were also excluded.
The final dataset contains 2450 single-point variations endowed wuth experimental ΔΔG on 115 protein sequences.
This dataset has been obtained by mapping variations included in the S669 test set (Pancotti et al., 2022) on full-length UniProt sequences.
The final dataset contains 669 single-point variations endowed with experimental ΔΔG on 87 unique protein sequences.
This dataset has been obtained by mapping multi-site variations included in the PTmul test set (Montanucci et al., 2019) on full-length UniProt sequences. Moreover, the dataset has been homology-reduced (30% sequence identity on 40% alignment coverage) with respect to the S2450 training set.
The final dataset contains 82 multi-point variations endowed with experimental ΔΔG on 13 protein sequences.