Bio.UniGene 套件
模組內容
解析 Unigene 平面檔案格式檔案,例如 Hs.data 檔案。
以下是此解析器處理的平面檔案格式的概述
行類型/限定詞
ID UniGene cluster ID TITLE Title for the cluster GENE Gene symbol CYTOBAND Cytological band EXPRESS Tissues of origin for ESTs in cluster RESTR_EXPR Single tissue or development stage contributes more than half the total EST frequency for this gene. GNM_TERMINUS genomic confirmation of presence of a 3' terminus; T if a non-templated polyA tail is found among a cluster's sequences; else I if templated As are found in genomic sequence or S if a canonical polyA signal is found on the genomic sequence GENE_ID Entrez gene identifier associated with at least one sequence in this cluster; to be used instead of LocusLink. LOCUSLINK LocusLink identifier associated with at least one sequence in this cluster; deprecated in favor of GENE_ID HOMOL Homology; CHROMOSOME Chromosome. For plants, CHROMOSOME refers to mapping on the arabidopsis genome. STS STS ACC= GenBank/EMBL/DDBJ accession number of STS [optional field] UNISTS= identifier in NCBI's UNISTS database TXMAP Transcript map interval MARKER= Marker found on at least one sequence in this cluster RHPANEL= Radiation Hybrid panel used to place marker PROTSIM Protein Similarity data for the sequence with highest-scoring protein similarity in this cluster ORG= Organism PROTGI= Sequence GI of protein PROTID= Sequence ID of protein PCT= Percent alignment ALN= length of aligned region (aa) SCOUNT Number of sequences in the cluster SEQUENCE Sequence ACC= GenBank/EMBL/DDBJ accession number of sequence NID= Unique nucleotide sequence identifier (gi) PID= Unique protein sequence identifier (used for non-ESTs) CLONE= Clone identifier (used for ESTs only) END= End (5'/3') of clone insert read (used for ESTs only) LID= Library ID; see Hs.lib.info for library name and tissue MGC= 5' CDS-completeness indicator; if present, the clone associated with this sequence is believed CDS-complete. A value greater than 511 is the gi of the CDS-complete mRNA matched by the EST, otherwise the value is an indicator of the reliability of the test indicating CDS completeness; higher values indicate more reliable CDS-completeness predictions. SEQTYPE= Description of the nucleotide sequence. Possible values are mRNA, EST and HTC. TRACE= The Trace ID of the EST sequence, as provided by NCBI Trace Archive
- class Bio.UniGene.SequenceLine(text=None)
基礎:
object
儲存來自 Unigene 檔案的單個 SEQUENCE 行的資訊。
使用 SEQUENCE 行的文字部分初始化,或不使用。
- 屬性和描述(以小寫字母存取)
ACC= 序列的 GenBank/EMBL/DDBJ 登錄號
NID= 唯一的核苷酸序列識別碼 (gi)
PID= 唯一的蛋白質序列識別碼(用於非 EST)
CLONE= 選殖體識別碼(僅用於 EST)
END= 選殖體插入讀取的末端 (5'/3')(僅用於 EST)
LID= 文庫 ID;有關文庫名稱和組織,請參閱 Hs.lib.info
MGC= 5' CDS 完整性指示器;如果存在,則與此序列相關聯的選殖體被認為是 CDS 完整的。大於 511 的值是與 EST 匹配的 CDS 完整 mRNA 的 gi,否則該值是指示 CDS 完整性測試可靠性的指標;較高的值表示更可靠的 CDS 完整性預測。
SEQTYPE= 核苷酸序列的描述。可能的值為 mRNA、EST 和 HTC。
TRACE= EST 序列的追蹤 ID,由 NCBI 追蹤檔案提供
- __init__(text=None)
初始化類別。
- __repr__()
將 UniGene SequenceLine 物件以字串形式傳回。
- class Bio.UniGene.ProtsimLine(text=None)
基礎:
object
儲存來自 Unigene 檔案的單個 PROTSIM 行的資訊。
使用 PROTSIM 行的文字部分初始化,或不使用。
屬性和描述(以小寫字母存取)ORG= 生物體 PROTGI= 蛋白質的序列 GI PROTID= 蛋白質的序列 ID PCT= 對齊百分比 ALN= 對齊區域的長度 (aa)
- __init__(text=None)
初始化類別。
- __repr__()
將 UniGene ProtsimLine 物件以字串形式傳回。
- class Bio.UniGene.STSLine(text=None)
基礎:
object
儲存來自 Unigene 檔案的單個 STS 行的資訊。
使用 STS 行的文字部分初始化,或不使用。
屬性和描述(以小寫字母存取)
ACC= STS 的 GenBank/EMBL/DDBJ 登錄號 [選填欄位] UNISTS= NCBI UNISTS 資料庫中的識別碼
- __init__(text=None)
初始化類別。
- __repr__()
將 UniGene STSLine 物件以字串形式傳回。
- class Bio.UniGene.Record
基礎:
object
儲存 Unigene 記錄。
以下是儲存的內容
self.ID = '' # ID line self.species = '' # Hs, Bt, etc. self.title = '' # TITLE line self.symbol = '' # GENE line self.cytoband = '' # CYTOBAND line self.express = [] # EXPRESS line, parsed on ';' # Will be an array of strings self.restr_expr = '' # RESTR_EXPR line self.gnm_terminus = '' # GNM_TERMINUS line self.gene_id = '' # GENE_ID line self.locuslink = '' # LOCUSLINK line self.homol = '' # HOMOL line self.chromosome = '' # CHROMOSOME line self.protsim = [] # PROTSIM entries, array of Protsims # Type ProtsimLine self.sequence = [] # SEQUENCE entries, array of Sequence entries # Type SequenceLine self.sts = [] # STS entries, array of STS entries # Type STSLine self.txmap = [] # TXMAP entries, array of TXMap entries
- __init__()
初始化類別。
- __repr__()
將 UniGene Record 物件以字串形式表示以進行偵錯。
- Bio.UniGene.parse(handle)
讀取和載入 UniGene 記錄,適用於包含多個記錄的檔案。
- Bio.UniGene.read(handle)
讀取和載入 UniGene 記錄,每個檔案一個記錄。