INFO:root:Compiling cm dataset using only active sequences. INFO:root:Generating CSV. INFO:root:Created data/interim/cm/cm.csv. INFO:root:Pre-computing distances. INFO:root:Generating 3 partitions. INFO:root:Partition not possible at threshold 0.35. Less than 25 % of sequences found in a split: INFO:root:- 46.26 % (303/655) in split 1. INFO:root:- 30.69 % (201/655) in split 2. INFO:root:- 23.05 % (151/655) in split 3. INFO:root:Increasing threshold from 0.35 to 0.4 to achieve balance. INFO:root:Dataset successfully split at threshold 0.4. INFO:root:Summary: INFO:root:- 39.88 % (341/855) in split 1, with p(class=1) = 77.13 %. INFO:root:- 30.29 % (259/855) in split 2, with p(class=1) = 69.88 %. INFO:root:- 29.82 % (255/855) in split 3, with p(class=1) = 61.18 %. INFO:root:Non-active sequences removed. INFO:root:Final dataset size: 855. INFO:root:File saved in data/processed/cm/cm.csv