kmerdb package

Submodules

kmerdb.database module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class kmerdb.database.SqliteKdb(filename: str, k: int)

Bases: object

kmerdb.database.histogram(conn)

kmerdb.distance module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

kmerdb.distance.correlation(fname1, fname2)
kmerdb.distance.d2s(mono_x, mono_y, total_kmers_x, total_kmers_y, k, x, y)
kmerdb.distance.euclidean(fname1, fname2)
kmerdb.distance.hamming(k, x, y)
kmerdb.distance.spearman(x, y)

kmerdb.fileutil module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class kmerdb.fileutil.KDBReader(filename: str = None, fileobj: io.IOBase = None, mode: str = 'r', max_cache: int = 100)

Bases: Bio.bgzf.BgzfReader

profile

Here we want to load the metadata blocks. We want to load the first two lines of the file: the first line is the version, followed by the number of metadata blocks

slurp(dtype: str = 'int32')

A function to read an entire .kdb file into memory

class kmerdb.fileutil.KDBWriter(header: collections.OrderedDict, filename=None, mode='w', fileobj=None, compresslevel=6)

Bases: Bio.bgzf.BgzfWriter

compresslevel

Write the header to the file

kmerdb.fileutil.open(filepath, mode='r', *args)

kmerdb.index module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class kmerdb.index.IndexBuilder(kdbfile: str, k: int)

Bases: object

class kmerdb.index.IndexReader(indexfile: str)

Bases: object

kmerdb.index.has_index(kdbfile)
kmerdb.index.is_gz_file(filepath)
kmerdb.index.open(filepath, mode='r', k=None, idx=None)
kmerdb.index.read_line(kdbrdr: kmerdb.fileutil.KDBReader, kdbidx: kmerdb.index.IndexReader, kmer_id)
kmerdb.index.write_index(index: numpy.array, indexfile: str, k: int)

kmerdb.kmer module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class kmerdb.kmer.Kmers(k, strand_specific=True)

Bases: object

A wrapper class to pass variables through the multiprocessing pool

Variables
  • k – The choice of k to shred with

  • strand_specific – Include k-mers from forward strand only

shred(seqRecord)
Parameters

seqRecord (Bio.SeqRecord.SeqRecord) –

Returns

Return type

kmerdb.kmer.id_to_kmer(id, k)
kmerdb.kmer.kmer_to_id(s)

Convert a fixed length k-mer string to the binary encoding parameterized upon that same k

Note that the conversion of a k-mer string to an id integer is consistent regardless of k, because the k is implicit in the k-mer string’s size.

Therefore, this method does not need to be wrapped in the k-mer class

Parameters

s (str) – The input k-mer

Returns

The kPal-inspired binary encoding

Return type

int

kmerdb.kmer.neighbors(s, k)

kmerdb.parse module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

kmerdb.parse.parsefile(filepath, k, p=1, b=50000, stranded=True)

Parse a single sequence file in blocks/chunks with multiprocessing support

Parameters
  • filepath (str) – Path to a fasta or fastq file

  • k (int) – Choice of k to shred k-mers with

  • p (int) – Number of processes

  • b (int) – Number of reads (per block) to process in parallel

  • stranded (bool) – Strand specificity argument for k-mer shredding process

Returns

(db, header_dictionary) header_dictionary is the file’s metadata for the header block

Return type

(kdb.database.SqliteKdb, dict)

kmerdb.seqparser module

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class kmerdb.seqparser.SeqParser(filepath, num, k)

Bases: object

Largely independent module, needs 3 pieces of information passed back in from the outside

header_dict()

Create a header dictionary to convert into YAML to go in the header block of the compression header. Has a schema to be validated, defined in config.py

Returns

dict

Return type

Module contents

Copyright 2020 Matthew Ralston

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

kmerdb.cli()
kmerdb.distances(arguments)
kmerdb.get_matrix(arguments)
kmerdb.get_root_logger(level)
kmerdb.header(arguments)
kmerdb.hierarchical(arguments)

Thanks to https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/ for the inspiration

kmerdb.index_file(arguments)
kmerdb.kmeans(arguments)
kmerdb.markov_probability(arguments)
kmerdb.profile(arguments)
kmerdb.view(arguments)