kmerdb package¶
Submodules¶
kmerdb.database module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
kmerdb.database.
SqliteKdb
(filename: str, k: int)¶ Bases:
object
-
kmerdb.database.
histogram
(conn)¶
kmerdb.distance module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
kmerdb.distance.
correlation
(fname1, fname2)¶
-
kmerdb.distance.
d2s
(mono_x, mono_y, total_kmers_x, total_kmers_y, k, x, y)¶
-
kmerdb.distance.
euclidean
(fname1, fname2)¶
-
kmerdb.distance.
hamming
(k, x, y)¶
-
kmerdb.distance.
spearman
(x, y)¶
kmerdb.fileutil module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
kmerdb.fileutil.
KDBReader
(filename: str = None, fileobj: io.IOBase = None, mode: str = 'r', max_cache: int = 100)¶ Bases:
Bio.bgzf.BgzfReader
-
profile
¶ Here we want to load the metadata blocks. We want to load the first two lines of the file: the first line is the version, followed by the number of metadata blocks
-
slurp
(dtype: str = 'int32')¶ A function to read an entire .kdb file into memory
-
-
class
kmerdb.fileutil.
KDBWriter
(header: collections.OrderedDict, filename=None, mode='w', fileobj=None, compresslevel=6)¶ Bases:
Bio.bgzf.BgzfWriter
-
compresslevel
¶ Write the header to the file
-
-
kmerdb.fileutil.
open
(filepath, mode='r', *args)¶
kmerdb.index module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
kmerdb.index.
IndexBuilder
(kdbfile: str, k: int)¶ Bases:
object
-
class
kmerdb.index.
IndexReader
(indexfile: str)¶ Bases:
object
-
kmerdb.index.
has_index
(kdbfile)¶
-
kmerdb.index.
is_gz_file
(filepath)¶
-
kmerdb.index.
open
(filepath, mode='r', k=None, idx=None)¶
-
kmerdb.index.
read_line
(kdbrdr: kmerdb.fileutil.KDBReader, kdbidx: kmerdb.index.IndexReader, kmer_id)¶
-
kmerdb.index.
write_index
(index: numpy.array, indexfile: str, k: int)¶
kmerdb.kmer module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
kmerdb.kmer.
Kmers
(k, strand_specific=True)¶ Bases:
object
A wrapper class to pass variables through the multiprocessing pool
- Variables
k – The choice of k to shred with
strand_specific – Include k-mers from forward strand only
-
shred
(seqRecord)¶ - Parameters
seqRecord (Bio.SeqRecord.SeqRecord) –
- Returns
- Return type
-
kmerdb.kmer.
id_to_kmer
(id, k)¶
-
kmerdb.kmer.
kmer_to_id
(s)¶ Convert a fixed length k-mer string to the binary encoding parameterized upon that same k
Note that the conversion of a k-mer string to an id integer is consistent regardless of k, because the k is implicit in the k-mer string’s size.
Therefore, this method does not need to be wrapped in the k-mer class
- Parameters
s (str) – The input k-mer
- Returns
The kPal-inspired binary encoding
- Return type
int
-
kmerdb.kmer.
neighbors
(s, k)¶
kmerdb.parse module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
kmerdb.parse.
parsefile
(filepath, k, p=1, b=50000, stranded=True)¶ Parse a single sequence file in blocks/chunks with multiprocessing support
- Parameters
filepath (str) – Path to a fasta or fastq file
k (int) – Choice of k to shred k-mers with
p (int) – Number of processes
b (int) – Number of reads (per block) to process in parallel
stranded (bool) – Strand specificity argument for k-mer shredding process
- Returns
(db, header_dictionary) header_dictionary is the file’s metadata for the header block
- Return type
(kdb.database.SqliteKdb, dict)
kmerdb.seqparser module¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
kmerdb.seqparser.
SeqParser
(filepath, num, k)¶ Bases:
object
Largely independent module, needs 3 pieces of information passed back in from the outside
-
header_dict
()¶ Create a header dictionary to convert into YAML to go in the header block of the compression header. Has a schema to be validated, defined in config.py
- Returns
dict
- Return type
-
Module contents¶
Copyright 2020 Matthew Ralston
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
kmerdb.
cli
()¶
-
kmerdb.
distances
(arguments)¶
-
kmerdb.
get_matrix
(arguments)¶
-
kmerdb.
get_root_logger
(level)¶
-
kmerdb.
header
(arguments)¶
-
kmerdb.
hierarchical
(arguments)¶ Thanks to https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ for the inspiration
-
kmerdb.
index_file
(arguments)¶
-
kmerdb.
kmeans
(arguments)¶
-
kmerdb.
markov_probability
(arguments)¶
-
kmerdb.
profile
(arguments)¶
-
kmerdb.
view
(arguments)¶