pLM4CPPs

Protein Language Model-Based Predictor for Cell Penetrating Peptides

pLM4CPPs is a deep learning architecture designed for predicting cell-penetrating peptides (CPPs). At its core, pLM4CPPs utilizes advanced pretrained protein language models (pLMs) trained on extensive protein sequence data. These models capture intricate sequence relationships and functional motifs critical for CPP activity, enhancing accuracy and reliability in classification. Key to pLM4CPPs is its integration of Convolutional Neural Networks (CNNs) for hierarchical feature extraction from peptide sequences, achieving superior performance metrics such as accuracy, Matthews Correlation Coefficient (MCC), and sensitivity. Multiple peptide embeddings from sources like BEPLER, CPCProt, SeqVec, ESM variants (ESM, ESM-2, ESM-1b, ESM-1v), ProtT5-XL UniRef50, and ProtT5-XL BFD are evaluated to optimize performance across diverse datasets. pLM4CPPs integrates predictions from multiple models to provide a consensus decision on CPP classification, ensuring robust results and reliability. This platform is the implementation of the paper: Kumar, N.; Du, Z.; Li, Y. pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides, J. Chem. Inf. Model. 2024 (Submitted).

Quick Output Version:

Large-scale Output Version:


Usage of the Web Server:


Quick Output Version:

Select a model, input peptide sequences, and click "Run" for quick predictions.

Notice: Support multiple sequences input (e.g., "VPP,IPP,CCL,AGR").

Large-scale Output Version:

Upload files (xls, xlsx, txt, fasta) and click "Run" for batch predictions.

Notice: File preparation guidelines available at the repository.

Model Performance on Test Dataset:


Model ACC BACC Sn Sp MCC
pLM4CPPs (ESM-1280) 0.929 0.893 0.820 0.966 0.808
pLM4CPPs (ESM-640) 0.923 0.880 0.791 0.968 0.792
pLM4CPPs (ESM-480) 0.931 0.907 0.860 0.955 0.816
pLM4CPPs (ESM-320) 0.923 0.892 0.831 0.955 0.795
pLM4CPPs (ProtT5-XL BFD) 0.921 0.891 0.831 0.951 0.789
pLM4CPPs (SeqVec) 0.932 0.901 0.838 0.965 0.819
Note: Accuracy (ACC), balanced accuracy (BACC), sensitivity (Sn), specificity (Sp), and matthews correlation coefficient (MCC).

Schematic framework of pLM4CPPs:


Whole architecture

pLM4CPPs & Independent Dataset


Contact and Support


Nandan Kumar (nandan@ksu.edu)

Zhenjiao Du (zhenjiao@ksu.edu)

Yonghui Li (yonghui@ksu.edu)