NLS Mapper

cNLS Mapper

Help Page

[1] About cNLS Mapper

cNLS Mapper accurately predicts nuclear localization signals (NLSs) specific to the importin αβ pathway by calculating NLS scores (levels of NLS activities), but not by the conventional sequence similarity search or by the machine learning strategy. The NLS scores are calculated with four NLS profiles (for class 1/2, class 3, class 4, and bipartite NLSs), each of which represents a contribution of every amino acid residue at every position within an NLS class to the entire NLS activity (1) . These profiles are generated by an extensive amino acid replacement analysis for each NLS class in budding yeast. We have found that each residue within an NLS, most cases, additively and independently contributes to the entire activity (1, 2) . Thus, NLS activities (scores) can be estimated by serially adding positive or negative contribution scores given in an NLS profile. The additive property of NLS motifs also allowed us to design potent peptide inhibitors specific to the importin α/β nuclear import pathway by serially selecting amino acids with a high contribution score in an NLS profile (2) .

It should be noted that the NLS profiles were generated by nuclear import assays in budding yeast and, thus, NLS prediction for other species may be less effective than that for yeast, although the importin α/β pathway is highly conserved in eukaryotes. In addition, cNLS Mapper cannot predict importin α-independent NLSs (NLSs directly recognized by several importin β members).

[2] How to use cNLS Mapper

(a) Input sequense

Any sequence forms including plain text are acceptable. Any characters except alphabet are ignored, but non-amino acid codes, such as b, x, and z, are prohibited. A minimal and maximal acceptable length of imput sequence is 11 and 4999 amino acids, respectively. A single sequence can be pasted in the text box or a text file containing a sequence in fasta or row format.

(b) Cut-off score

cNLS Mapper extracts putative NLS sequences with a score equal to or more than the selected cut-off score. Higher scores indicate stronger NLS activities. Briefly, a GUS-GFP reporter protein fused to an NLS with a score of 8, 9, or 10 is exclusively localized to the nucleus, that with a score of 7 or 8 partially localized to the nucleus, that with a score of 3, 4, or 5 localized to both the nucleus and the cytoplasm , and that with a score of 1 or 2 localized to the cytoplasm (see Supplemental Figure S1 in refs. 1 or 3 for detail).

(c) Search for bipartite NLSs with a long linker

The optimal length of the linker of bipartite NLSs is 10-12 amino acids, as demonstrated earlier. However, we have observed that bipartite NLSs with linkers at least up to 20 amino acids are functional in an unstructured flexible region, although a longer linker decreases the NLS activity. Because unstructured regions of proteins are generally found in the N- and C-terminal regions, the protein regions searched for bipartite NLSs with a long linker are restricted, as default, to the terminal 60-amino acid regions. But you can select an option by which the entire region of a protein is searched for the long bipartite NLSs. This option may also help to identify a structure-dependent bipartite NLS, whose terminal basic cores are located close to each other in the tertiary structure.

(d) Search result

A result page will appear immediately after you click the "Predict NLS" (submit)button. In the upper part of result page, your input sequence is presented , where, if there is a predicted NLS(s), it is highlighted in red. In the lower part, the predicted NLSs are devided into two classes, monopartite and bipartite NLSs, and their sequence, positions in the protein, and calculated scores are listed . The program eliminates the flanking residues that do not affect the NLS activity from the initially predicted sequences to display only essential sequences.

cNLS Mapper often displays several similar overlapping NLSs with different scores. Because the program scans the protein sequence with a window size of 16 amino acid residues for monopartite NLSs and a window size of 26-28 amino acid residues for bipartite NLSs (26-36 for long bipartite NLSs) and with a shift size of one amino acid and calculates the score for every sequence, basic-rich regions often exhibit high scores at multiple different scan windows. Thus, when multiple overlapping NLS sequences are displayed, the sequence with the highest score is, in most cases, considered to be the true NLS.

APPENDIX: a brief overview of nuclear import and export of proteins

Importin α-dependent and -independent NLSs

Nuclear import and export of macromolecules are mediated by receptors called karyopherins, also known as importins and exportins. Nuclear import of proteins is generally initiated by the formation of a ternary complex with importin α, importin β1 and a cargo, where impotin α serves as an adapter for importin β1 and recognizes nuclear localization signals (NLSs) within the cargos. On the other hand, several importin β members including importin β1 and importin β2 (also known as Transportin) directly binds some specific classes of NLSs.

Classical NLS

The classical NLSs rich in basic amino acids are known as NLSs recognized by importin α, and are classified into two major classes, monopartite and bipartite NLSs. Monopartite NLSs contain a single cluster of basic residues and are divided into two subclasses; one with at least 4 consecutive basic amino acids (class 1) and the other with three basic amino acids, represented by K(K/R)X(K/R) as a putative consensus sequence where X indicates any amino acid (class 2). These two classes are exemplified by the SV40 large T antigen NLS (PKKKRKV) and the c-Myc NLS (PAAKRVKLD). Bipartite NLSs contain two clusters of basic residues separated by a 10-12 amino acid linker and are exemplified by the nucleoplasmin NLS (KRPAATKKAGQAKKKK). A putative consensus sequence of the bipartite NLS has been defined as (K/R)(K/R)X10-12(K/R)3/5 where X indicates any amino acid and (K/R)3/5 represents at least three of either lysine or arginine out of five consecutive amino acids.

Although these consensus sequences have been approved, there are a considerable number of non-functional NLSs that match the consensus sequences. Translated ORFs (5869 proteins) from the budding yeast genome were searched using the consensus NLS patterns, over 3000 and 1500 independent proteins that perfectly matched the monopartite and bipartite patterns, respectively, have been found. This substantially exceeds the expected number of nuclear protein which is estimated to be ~30% of all the yeast proteins. Thus, the current prevailing consensus for the classical NLSs is incomplete and there must be more specific consensus sequences.

Noncanonical NLSs recognized by importin α

Many noncanonical NLSs that do not match the classical NLSs have been defined. Kosugi et al. have been reported three classes of noncanonical monpartite NLSs (classes 3-5) that bind directly to importin α (3) . Class 3 and class 4 NLSs have KRX(W/F/Y)XXAF and (P/R)XXKR(K/R) core sequences, respectively, and specifically bind to the minor binding pocket of importin α. This is contrasting with the classical class 1 and class 2 NLSs that specifically bind to the major binding pocket of importin α. It is suggested that a bipartite NLS is a hybrid sequence consisted of imperfect class 1/2 and class 3/4 NLSs because the N-terminal and C-terminal basic regions of bipartite NLSs bind to the minor and major binding pockets of importin α, respectively (4) . Class 5 NLSs has a LGKR(K/R)(W/F/Y) core sequence and is specific only to plants, in contrast with the other classes, all of which are functional in yeast, plants and mammals (3) .

Noncanonical NLSs recognized by importin β members

Although this NLS prediction program deals only with importin α-dependent NLSs, there are a number of noncanonical NLSs that directly bind to importin βs (5) . These NLSs are generally longer and more variable than the classical NLSs. The consensus sequence of the NLSs recognized by importin β1 has not been established but many of the NLSs are rich in arginine, as exemplified by the HTLV-1 Rex NLS (MPKTRRRPRRSQRKRPPT). On the other hand, it has been reported that the consensuses of importin β2 (Transportin)-dependent NLSs are represented by patterns with overall basic character that contain a central hydrophobic or basic motif followed by a C-terminal R/H/KX(2-5)PY consensus sequence (6) .

Nuclear export signals

Many nuclear proteins conatin both NLSs and nuclear export signals (NESs) and shuttle between the nucleus and the cytoplasm through coordination of these transport signals. Nuclear export of proteins occurs through the classical nuclear export pathway mediated by an evolutionarily conserved CRM1/exportin protein and through the nonclassical export pathways mediated by other importin β members such as Msn5 (7) . For the classical nuclear export pathway, the CRM1-Ran-GTP complex binds directly to the NES contained in the cargo and directs the export of the ternary complex from the nucleus. NESs that are recognized by CRM1 (i.e., leucine-rich NESs) typically contain large hydrophobic conserved residues separated by a variable number of amino acids, given by the traditional consensus sequence L-X(2,3)-[LIVFM]-X(2,3)-L-X-[LI], where X(2,3) represents any two or three amino acids. A recent study has demonstrated that the classical NESs are classified into six patterns according to the hydrophobic spacing and has established more reliable NES consensus sequences, which fit most experimentally confirmed NESs, than the traditional consensus sequence (8) .

Kosugi S., Hasebe M., Tomita M., and Yanagawa H. (2009) Systematic identification of yeast cell cycle-dependent nucleocytoplasmic shuttling proteins by prediction of composite motifs. Proc. Natl. Acad. Sci. USA 106, 10171-10176.

Kosugi S., Hasebe M., Entani T., Takayama S., Tomita M., and Yanagawa H. (2008) Chem. Biol. 15, 940-949.

Kosugi S., Hasebe M., Matsumura N., Takashima H., Miyamoto-Sato E., Tomita M., and Yanagawa H. (2009) J. Biol. Chem. 284, 478-485.

Conti E., Uy M., Leighton L., Blobel G., and Kuriyan J. (1998) Cell 94, 193-204.

Harel, A., and Forbes, D.J. (2004) Mol. Cell 16, 319-330.

Lee B.J., Cansizoglu A.E., Suel K.E., Louis T.H., Zhang Z., and Chook Y.M. (2006) Cell 126, 543-558.

Ossareh-Nazari, B., Gwizdek, C., and Dargemont, C. (2001) Traffic 2, 684-689.

Kosugi, S., Hasebe M., Tomita M., and Yanagawa H. (2008) Traffic 9, 2053-2062.