datasci_tools package

Submodules

datasci_tools.abstract_class_utils module

Ex: Can create an abstract class with an abstract function and define functionality in that abstract function, and then reference that functionality when overriding the abstract method in the child class

class datasci_tools.abstract_class_utils.Dog[source]

Bases: ABC

abstract bark()[source]
class datasci_tools.abstract_class_utils.Doggy[source]

Bases: Dog

__init__()[source]
bark()[source]

datasci_tools.algorithms_utils module

Purpose: defining general algorithms that help with processing

datasci_tools.algorithms_utils.compare_uneven_groups(group1, group2, comparison_func, group_name='no_name', return_differences=False, print_flag=False)[source]

Pseudocode: will return lists from each groups that are not in the other group

Example: from datasci_tools import algorithms_utils as au au = reload(au) au.compare_uneven_groups(obj1.inside_pieces[:10],obj2.inside_pieces,

comparison_func = tu.compare_meshes_by_face_midpoints, group_name=”inside_pieces”, return_differences=True)

def equal_func(a,b):

return a == b

au.compare_uneven_groups([1,3,5,7,9,10],[2,4,6,8,10],

comparison_func = equal_func, group_name=”numbers_list”, return_differences=True)

datasci_tools.argparse_utils module

Purpose: Go through the specifics of how to enable commmand line interface for a python script

Tutorial: https://realpython.com/command-line-interfaces-python-argparse/

argument_options_notes=

argparse_arguments_notes =

datasci_tools.argparse_utils.example_basic_argparse_test()[source]
datasci_tools.argparse_utils.print_help_str(parser)[source]

datasci_tools.data_struct_utils module

class datasci_tools.data_struct_utils.DictType(*args, **kwargs)[source]

Bases: object

Purpose: To have a dictionary like object that also stores the default typs of the data if need be

Ex: import data_structure_utils as dsu my_obj = dsu.DictType(hello=(“x”,str),hi=(int,5),yes=6) my_obj2 = dsu.DictType(hello=(“xy”,str),his=(int,5),yes=(10,bool)) my_obj

from datasci_tools import general_utils as gu output_obj = gu.merge_dicts([my_obj,my_obj2,{}]) output_obj

__init__(*args, **kwargs)[source]
asdict()[source]
copy()[source]
items()[source]
keys()[source]
lowercase()[source]
parse_types_from_dict(parse_verbose=False, **kwargs)[source]
update(B)[source]
values()[source]
datasci_tools.data_struct_utils.parse_type_from_entry(v, default_type=None)[source]

datasci_tools.dataclass_utils module

to explore the use cases of the dataclass module

— source 1 —- link: https://www.dataquest.io/blog/how-to-use-python-data-classes/#:~:text=In%20Python%2C%20a%20data%20class,a%20program%20or%20a%20system.

What is purpose of dataclass? 1) To easily create a class meant for just storing data that already has a bunch of the basic functions created for you - __init__, __eq__, __repr__

  • advantage: Don’t have to create own init function

2) with other options for the decorator (like order = True) other functions would be implemented __lt__ (less than), __le__ (less or equal), __gt__ (greater than), and __ge__ (greater or equal)

  1. Could set the object to frozen so that nothing can be changed

  2. Can do inheritance from one dataclass to the next

How to use? Just import the class and use it as a decorator

What did it use to look like? We used to have to implement a class from scratch like class Person():

def __init__(self, name=’Joe’, age=30, height=1.85, email=’joe@dataquest.io’):

self.name = name self.age = age self.height = height self.email = email

def __eq__(self, other):
if isinstance(other, Person):
return (self.name, self.age,
self.height, self.email) == (other.name, other.age,

other.height, other.email)

return NotImplemented

class datasci_tools.dataclass_utils.DataclassSubscript[source]

Bases: object

__init__() None
datasci_tools.dataclass_utils.examples()[source]

datasci_tools.datajoint_motifs module

Part 1: You have a Set table (Ex: Segment Set)

Nested Compute

Mesh Table: Will hold the abstract ideas Mesh Maker: Computed Part table that has a make function: - in charge of mapping how a segent becomes a mesh

Optimal Make method: 1) Get items form the key 2) Run method that will turn it into object 3) Add method ouptut and key together in dictionary 4) Insert the key

Terminology: 1) Entity: any top level table wrapping an object (Ex: Segment,Mesh) 2) ethod: Top level table that aggregates the nested methods:

–> nested method: single method with some parameters (there is only on thing that the table does, ex: turn segment into mesh)

  1. Maker: A nested compute (ex: that makes meshes)

  2. Nested Store:

Entity A has get function that gets a single object in nested store - nested store will have

get put

With a Maker class that has: 1) Entity A 2) Method B 2) Entity B

datasci_tools.dict_utils module

datasci_tools.dj_utils module

datasci_tools.docstring_utils module

Tutorial 1: Sphinx docstring format

source: https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html#:~:text=The%20Sphinx%20docstring%20format,-In%20general%2C%20a&text=A%20pair%20of%20%3Aparam%3A%20and,values%20returned%20by%20our%20code.

Documentation for different docstring config with autoDocstringf

https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring

datasci_tools.docstring_utils.docstring_with_code_example()[source]

Returns a list of cycles which form a basis for cycles of G.

A basis for cycles of a network is a minimal collection of cycles such that any cycle in the network can be written as a sum of cycles in the basis. Here summation of cycles is defined as “exclusive or” of the edges. Cycle bases are useful, e.g. when deriving equations for electric circuits using Kirchhoff’s Laws.

Parameters:
  • G (NetworkX Graph) –

  • root (node, optional) – Specify starting node for basis.

Returns:

  • A list of cycle lists. Each cycle list is a list of nodes

  • which forms a cycle (loop) in G.

Examples

>>> G = nx.Graph()
>>> nx.add_cycle(G, [0, 1, 2, 3])
>>> nx.add_cycle(G, [0, 3, 4, 5])
>>> nx.cycle_basis(G, 0)
[[3, 4, 5, 0], [1, 2, 3, 0]]

Notes

This is adapted from algorithm CACM 491 [1].

References

See also

simple_cycles

datasci_tools.docstring_utils.myfunc(x=10, y=20)[source]

_summary_

Parameters:
  • x (int, optional) – _description_, by default 10

  • y (int, optional) – _description_, by default 20

Returns:

_description_

Return type:

_type_

datasci_tools.docstring_utils.simple_arg_return_docstring()[source]

function to turn a trimesh object of a neuron into a skeleton, without running soma collapse, or recasting result into a Skeleton. Used by meshparty.skeletonize.skeletonize_mesh() and makes use of meshparty.skeletonize.skeletonize_components()

Parameters:
  • mesh (meshparty.trimesh_io.Mesh) – the mesh to skeletonize, defaults assume vertices in nm

  • soma_pt (np.array) – a length 3 array specifying to soma location to make the root default=None, in which case a heuristic root will be chosen in units of mesh vertices

  • soma_thresh (float) – distance in mesh vertex units over which to consider mesh vertices close to soma_pt to belong to soma these vertices will automatically be invalidated and no skeleton branches will attempt to reach them. This distance will also be used to collapse all skeleton points within this distance to the soma_pt root if collpase_soma is true. (default=7500 (nm))

  • invalidation_d (float) – the distance along the mesh to invalidate when applying TEASAR like algorithm. Controls how detailed a structure the skeleton algorithm reaches. default (10000 (nm))

  • smooth_neighborhood (int) – the neighborhood in edge hopes over which to smooth skeleton locations. This controls the smoothing of the skeleton (default 5)

  • large_skel_path_threshold (int) – the threshold in terms of skeleton vertices that skeletons will be nominated for tip merging. Smaller skeleton fragments will not be merged at their tips (default 5000)

  • cc_vertex_thresh (int) – the threshold in terms of vertex numbers that connected components of the mesh will be considered for skeletonization. mesh connected components with fewer than these number of vertices will be ignored by skeletonization algorithm. (default 100)

  • return_map (bool) – whether to return a map of how each mesh vertex maps onto each skeleton vertex based upon how it was invalidated.

Returns:

  • skel_verts (np.array) – a Nx3 matrix of skeleton vertex positions

  • skel_edges (np.array) – a Kx2 matrix of skeleton edge indices into skel_verts

  • smooth_verts (np.array) – a Nx3 matrix of vertex positions after smoothing

  • skel_verts_orig (np.array) – a N long index of skeleton vertices in the original mesh vertex index

  • (mesh_to_skeleton_map) (np.array) – a Mx2 map of mesh vertex indices to skeleton vertex indices

datasci_tools.docstring_utils.summary_from_docstring(docstring, verbose=False)[source]

Purpose: To extract the summary portion of the docstring

Example

my_str = ‘’’ This is some docstring blah blah blah blah

Other ‘’’

summary_from_docstring(my_str)

datasci_tools.docstring_utils.very_advanced_docstring()[source]

Base class for undirected graphs.

A Graph stores nodes and edges with optional data, or attributes.

Graphs hold undirected edges. Self loops are allowed but multiple (parallel) edges are not.

Nodes can be arbitrary (hashable) Python objects with optional key/value attributes, except that None is not allowed as a node.

Edges are represented as links between nodes with optional key/value attributes.

Parameters:
  • incoming_graph_data (input graph (optional, default: None)) – Data to initialize graph. If None (default) an empty graph is created. The data can be any format that is supported by the to_networkx_graph() function, currently including edge list, dict of dicts, dict of lists, NetworkX graph, 2D NumPy array, SciPy sparse matrix, or PyGraphviz graph.

  • attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.

See also

DiGraph, MultiGraph, MultiDiGraph

Examples

Create an empty graph structure (a “null graph”) with no nodes and no edges.

>>> G = nx.Graph()

G can be grown in several ways.

Nodes:

Add one node at a time:

>>> G.add_node(1)

Add the nodes from any container (a list, dict, set or even the lines from a file or the nodes from another graph).

>>> G.add_nodes_from([2, 3])
>>> G.add_nodes_from(range(100, 110))
>>> H = nx.path_graph(10)
>>> G.add_nodes_from(H)

In addition to strings and integers any hashable Python object (except None) can represent a node, e.g. a customized node object, or even another Graph.

>>> G.add_node(H)

Edges:

G can also be grown by adding edges.

Add one edge,

>>> G.add_edge(1, 2)

a list of edges,

>>> G.add_edges_from([(1, 2), (1, 3)])

or a collection of edges,

>>> G.add_edges_from(H.edges)

If some edges connect nodes not yet in the graph, the nodes are added automatically. There are no errors when adding nodes or edges that already exist.

Attributes:

Each graph, node, and edge can hold key/value attribute pairs in an associated attribute dictionary (the keys must be hashable). By default these are empty, but can be added or changed using add_edge, add_node or direct manipulation of the attribute dictionaries named graph, node and edge respectively.

>>> G = nx.Graph(day="Friday")
>>> G.graph
{'day': 'Friday'}

Add node attributes using add_node(), add_nodes_from() or G.nodes

>>> G.add_node(1, time="5pm")
>>> G.add_nodes_from([3], time="2pm")
>>> G.nodes[1]
{'time': '5pm'}
>>> G.nodes[1]["room"] = 714  # node must exist already to use G.nodes
>>> del G.nodes[1]["room"]  # remove attribute
>>> list(G.nodes(data=True))
[(1, {'time': '5pm'}), (3, {'time': '2pm'})]

Add edge attributes using add_edge(), add_edges_from(), subscript notation, or G.edges.

>>> G.add_edge(1, 2, weight=4.7)
>>> G.add_edges_from([(3, 4), (4, 5)], color="red")
>>> G.add_edges_from([(1, 2, {"color": "blue"}), (2, 3, {"weight": 8})])
>>> G[1][2]["weight"] = 4.7
>>> G.edges[1, 2]["weight"] = 4

Warning: we protect the graph data structure by making G.edges a read-only dict-like structure. However, you can assign to attributes in e.g. G.edges[1, 2]. Thus, use 2 sets of brackets to add/change data attributes: G.edges[1, 2][‘weight’] = 4 (For multigraphs: MG.edges[u, v, key][name] = value).

Shortcuts:

Many common graph features allow python syntax to speed reporting.

>>> 1 in G  # check if node in graph
True
>>> [n for n in G if n < 3]  # iterate through nodes
[1, 2]
>>> len(G)  # number of nodes in graph
5

Often the best way to traverse all edges of a graph is via the neighbors. The neighbors are reported as an adjacency-dict G.adj or G.adjacency()

>>> for n, nbrsdict in G.adjacency():
...     for nbr, eattr in nbrsdict.items():
...         if "weight" in eattr:
...             # Do something useful with the edges
...             pass

But the edges() method is often more convenient:

>>> for u, v, weight in G.edges.data("weight"):
...     if weight is not None:
...         # Do something useful with the edges
...         pass

Reporting:

Simple graph information is obtained using object-attributes and methods. Reporting typically provides views instead of containers to reduce memory usage. The views update as the graph is updated similarly to dict-views. The objects nodes, edges and adj provide access to data attributes via lookup (e.g. nodes[n], edges[u, v], adj[u][v]) and iteration (e.g. nodes.items(), nodes.data(‘color’), nodes.data(‘color’, default=’blue’) and similarly for edges) Views exist for nodes, edges, neighbors()/adj and degree.

For details on these and other miscellaneous methods, see below.

Subclasses (Advanced):

The Graph class uses a dict-of-dict-of-dict data structure. The outer dict (node_dict) holds adjacency information keyed by node. The next dict (adjlist_dict) represents the adjacency information and holds edge data keyed by neighbor. The inner dict (edge_attr_dict) represents the edge data and holds edge attribute values keyed by attribute names.

Each of these three dicts can be replaced in a subclass by a user defined dict-like object. In general, the dict-like features should be maintained but extra features can be added. To replace one of the dicts create a new graph class by changing the class(!) variable holding the factory for that dict-like structure.

node_dict_factoryfunction, (default: dict)

Factory function to be used to create the dict containing node attributes, keyed by node id. It should require no arguments and return a dict-like object

node_attr_dict_factory: function, (default: dict)

Factory function to be used to create the node attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object

adjlist_outer_dict_factoryfunction, (default: dict)

Factory function to be used to create the outer-most dict in the data structure that holds adjacency info keyed by node. It should require no arguments and return a dict-like object.

adjlist_inner_dict_factoryfunction, (default: dict)

Factory function to be used to create the adjacency list dict which holds edge data keyed by neighbor. It should require no arguments and return a dict-like object

edge_attr_dict_factoryfunction, (default: dict)

Factory function to be used to create the edge attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object.

graph_attr_dict_factoryfunction, (default: dict)

Factory function to be used to create the graph attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object.

Typically, if your extension doesn’t impact the data structure all methods will inherit without issue except: to_directed/to_undirected. By default these methods create a DiGraph/Graph class and you probably want them to create your extension of a DiGraph/Graph. To facilitate this we define two class variables that you can set in your subclass.

to_directed_classcallable, (default: DiGraph or MultiDiGraph)

Class to create a new graph structure in the to_directed method. If None, a NetworkX class (DiGraph or MultiDiGraph) is used.

to_undirected_classcallable, (default: Graph or MultiGraph)

Class to create a new graph structure in the to_undirected method. If None, a NetworkX class (Graph or MultiGraph) is used.

Subclassing Example

Create a low memory graph class that effectively disallows edge attributes by using a single attribute dict for all edges. This reduces the memory used, but you lose edge attributes.

>>> class ThinGraph(nx.Graph):
...     all_edge_dict = {"weight": 1}
...
...     def single_edge_dict(self):
...         return self.all_edge_dict
...
...     edge_attr_dict_factory = single_edge_dict
>>> G = ThinGraph()
>>> G.add_edge(2, 1)
>>> G[2][1]
{'weight': 1}
>>> G.add_edge(2, 2)
>>> G[2][1] is G[2][2]
True

datasci_tools.dotmotif_utils module

datasci_tools.enum_utils module

Purpose: To implement an enum (non-native) in python where the list has an order to them

Purpose of enum: named constants

datasci_tools.enum_utils.example()[source]

datasci_tools.file_utils module

datasci_tools.filtering_utils module

datasci_tools.function_utils module

methods for helping inspect functions

datasci_tools.function_utils.all_functions_from_module(module, return_only_names=False, return_only_functions=False)[source]
datasci_tools.function_utils.arg_names(func)[source]

Purpose: To get the names of the argument

from datasci_tools import function_utils as funcu funcu.arg_names(myfunc)

datasci_tools.function_utils.rename(newname)[source]

Can be used as a decorator to rename the functions

datasci_tools.general_utils module

datasci_tools.hash_utils module

Purpose: functions for quick hashing

datasci_tools.hash_utils.hash_str(string, max_length=10)[source]

Purpoose: To hash a string and truncate to a certain length

Example: from datasci_tools import hash_utils as shu shu.hash_str(“The quick brown fox”)

datasci_tools.inspect_utils module

Useful wrappers and functions for the inspect module

datasci_tools.inspect_utils.built_in_func_from_name(obj_name)[source]
datasci_tools.inspect_utils.function_code_as_str(func)[source]
datasci_tools.inspect_utils.function_names(module, verbose=False)[source]

Purpose: return all function names from module

datasci_tools.inspect_utils.getcomments(func)[source]
datasci_tools.inspect_utils.global_vars(module, verbose=False)[source]

Purpose: Will return the names of the global variables defined in the module

from datasci_tools import numpy_utils as nu iu.global_vars(nu,verbose = True)

datasci_tools.inspect_utils.is_global_var_by_value(obj, verbose=False)[source]

datasci_tools.ipyvolume_movie_utils module

datasci_tools.ipyvolume_utils module

datasci_tools.json_utils module

datasci_tools.linalg_utils module

datasci_tools.linalg_utils.error_from_projection(vector_to_project, line_of_projection, idx_for_projection=None, verbose=False)[source]

Purpose: To return the error vector of a projection

Ex:

lu.error_from_projection( vector_to_project=orienting_coords[“top_left”], line_of_projection = hu.upward_vector_middle_non_scaled, verbose = True, idx_for_projection=np.arange(0,2) )

datasci_tools.linalg_utils.perpendicular_vec_2D(vec)[source]
datasci_tools.linalg_utils.projection(vector_to_project, line_of_projection, idx_for_projection=None, verbose=False, return_magnitude=False)[source]

Purpose: Will find the projection of a vector onto a line

datasci_tools.linalg_utils.rotation_matrix_2D(angle)[source]

datasci_tools.logging_utils module

Covers the basics of the logging module (built in python)

———source 1————– link: https://towardsdatascience.com/logging-in-python-a1415d0b8141

Standard levels (WARNING is the default level) ┌──────────┬───────┐ │ Level │ Value │ ├──────────┼───────┤ │ CRITICAL │ 50 │ │ ERROR │ 40 │ │ WARNING │ 30 │ │ INFO │ 20 │ │ DEBUG │ 10 │ └──────────┴───────┘

Purpose of module: without changing any code except the basicConfig function 1) To make print statements really easy to turn off and on with different levels 2) To output any print/debug statements to a file without changing, 3) print out other things (like time and traceback calls)

When would you use it? 1) debugging, 2) usage monitoring 3) performance monitoring

How to use: 1) put “logging.[level](‘some message in code’)” 2) set “logging.basicConfig(level=logging.LEVEL)” - only that level and above will be printed or executed 3) can set more args in basicConfig to

  • print to text

  • change output formatting

# – named logger vs root logger –

1) How to use the root logger logging.debug(“”) OR logger = logging.getLogger(); logger.debug(“”)

2) Using a named logger so that you are able to tell where logging came from logger = logging.getLogger(‘spam_application’); logger.debug(“”)

To automatically create a specific logger for every distinct module use: logger = logging.getLogger(__name__)

datasci_tools.logging_utils.examples()[source]

datasci_tools.matlab_utils module

datasci_tools.matlab_utils.loadmat(filepath)[source]

datasci_tools.matplotlib_utils module

datasci_tools.mesh_utils module

datasci_tools.module_utils module

datasci_tools.networkx_utils module

datasci_tools.new_module module

New module so to test the automatic build of documentation using github pages

datasci_tools.numpy_dep module

datasci_tools.numpy_dep.abs(*args, **kwargs)[source]
datasci_tools.numpy_dep.array(*args, **kwargs)[source]
datasci_tools.numpy_dep.max(*args, **kwargs)[source]
datasci_tools.numpy_dep.min(*args, **kwargs)[source]
datasci_tools.numpy_dep.round(*args, **kwargs)[source]

datasci_tools.numpy_utils module

datasci_tools.object_oriented_utils module

To provide information and functionality for object oriented programming

datasci_tools.object_oriented_utils.example_slots()[source]

Purpose: if don’t want to be able to add just any named attribute to an object dynamically (becuase normally can), can specify all possible attributes you might want

Why? So that the instance.__dict__ doesn’t grow large

datasci_tools.package_utils module

datasci_tools.pandas_utils module

datasci_tools.pathlib_utils module

datasci_tools.pathlib_utils.copy_file(filepath, destination)[source]
datasci_tools.pathlib_utils.create_folder(folder_path)[source]

To create a new folder

from datasci_tools import pathlib_utils as plu plu.create_folder(“/mnt/dj-stor01/platinum/minnie65/02/graphs”)

datasci_tools.pathlib_utils.ext(path)[source]
datasci_tools.pathlib_utils.filename(path)[source]
datasci_tools.pathlib_utils.filename_no_ext(path)[source]
datasci_tools.pathlib_utils.files_of_ext_type(directory, ext='py', verbose=False, return_stem=False)[source]

Purpose: Get all files with a certain extension

datasci_tools.pathlib_utils.inside_directory(directory, filepath)[source]

Ex: from pathlib import Path from datasci_tools import pathlib_utils as plu

root = Path(“/datasci_tools/datasci_tools/”)#.resolve() child = Path(“../datasci_tools/numpy_utils.py”)#.resolve() plu.inside_directory(root,child)

datasci_tools.pathlib_utils.n_levels_parent_above(parent, filepath, verbose=False)[source]

Purpose: Find the number of levels a parent directory is above a filepath

Pseudocode: 1) get the relative path 2) count the number of backslashes

Ex: plu.n_levels_parent_above(

filepath = Path(“/datasci_tools/datasci_tools/dj_utils.py”), parent = “/datasci_tools/”, verbose = True

)

datasci_tools.pathlib_utils.parent_directory(path)[source]
datasci_tools.pathlib_utils.parents(path)[source]
datasci_tools.pathlib_utils.py_files(directory, ext='py', verbose=False, return_stem=False)[source]
datasci_tools.pathlib_utils.relative_path_of_parent(parent, filepath)[source]

Purpose: Find the relative path to parent

datasci_tools.pathlib_utils.relative_to_absolute_path(path)
datasci_tools.pathlib_utils.resolve(path)[source]

datasci_tools.pipeline module

datasci_tools.pretty_print_confusion_matrix module

plot a pretty confusion matrix with seaborn Created on Mon Jun 25 14:17:37 2018 @author: Wagner Cipriano - wagnerbhbr - gmail - CEFETMG / MMC .. rubric:: References

https://www.mathworks.com/help/nnet/ref/plotconfusion.html https://stackoverflow.com/questions/28200786/how-to-plot-scikit-learn-classification-report https://stackoverflow.com/questions/5821125/how-to-plot-confusion-matrix-with-string-axis-rather-than-integer-in-python https://www.programcreek.com/python/example/96197/seaborn.heatmap https://stackoverflow.com/questions/19233771/sklearn-plot-confusion-matrix-with-labels/31720054 http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py

datasci_tools.pretty_print_confusion_matrix.configcell_text_and_colors(array_df, lin, col, oText, facecolors, posi, fz, fmt, show_null_values=0)[source]

config cell text and colors and return text elements to add and to dell @TODO: use fmt

datasci_tools.pretty_print_confusion_matrix.get_new_fig(fn, figsize=[9, 9])[source]

Init graphics

datasci_tools.pretty_print_confusion_matrix.insert_totals(df_cm)[source]

insert total column and line (the last ones)

datasci_tools.pretty_print_confusion_matrix.plot_confusion_matrix_from_data(y_test, predictions, columns=None, annot=True, cmap='Oranges', fmt='.2f', fz=11, lw=0.5, cbar=False, figsize=[8, 8], show_null_values=0, pred_val_axis='lin', axes_fontsize=30, title='Confusion matrix', title_fontsize=30, ticklabel_fontsize=15)[source]

plot confusion matrix function with y_test (actual values) and predictions (predic), whitout a confusion matrix yet

datasci_tools.pretty_print_confusion_matrix.pretty_plot_confusion_matrix(df_cm, annot=True, cmap='Oranges', fmt='.2f', fz=11, lw=0.5, cbar=False, figsize=[8, 8], show_null_values=0, pred_val_axis='y', axes_fontsize=30, title='Confusion matrix', title_fontsize=30, ticklabel_fontsize=15)[source]

print conf matrix with default layout (like matlab) params:

df_cm dataframe (pandas) without totals annot print text in each cell cmap Oranges,Oranges_r,YlGnBu,Blues,RdBu, … see: fz fontsize lw linewidth pred_val_axis where to show the prediction values (x or y axis)

‘col’ or ‘x’: show predicted values in columns (x axis) instead lines ‘lin’ or ‘y’: show predicted values in lines (y axis)

datasci_tools.pydantic_utls module

Purpose of module: a module that has: 1) Base class that a custom class can inherit from: - will automatically run checks on the types (or convert data to write type) of data arguments are - has pre-written functions like outputing dictionaries, jsons

  1. Different ways to put constraints on data

Overall Purpose: to more securily store data with checks

— Base settings: link: https://pavledjuric.medium.com/how-to-configure-the-settings-of-your-python-app-with-pydantic-d8191113dcb8

What problem is it solving: 1) how to store API keys and sensitive information in a class without necessarily hard coding it 2) apply type hinting to check arguments and hard coded values

How does it do that? It will first try to extract these values from environment variables, and only if they are not found in the environment, it will read the hardcoded values

What happens: if something is defined as an environment variable, it will use that ( so don’t have to hardcode it)

datasci_tools.pydantic_utls.example()[source]

datasci_tools.regex_utils module

Note: - * only works at the beginning of string

Rules: . any single character .* zero or more characters but scan all of them .+ one or more characters but scan all of them .+? zero or more characters, scan up to all of them but be wishy washy about it (.*) zero or more characters, scan all of them, but collect the whole thing as a field and then do nothing with it.

#— cheat sheet —-: https://www.rexegg.com/regex-quickstart.html . : any singl character [] one of characters in brackets []+: 1 or more of word wherever evercharacter is one category in the brackets

: word boundary w: word W: not word d: digit D: not digit s: whitespace ^: beginning of string $: end of string (end of line) |: means or

d{3}: 3 digits d?: optionally 0 or 1 digit d{0,3}: 0 to 3 digits

[A-Za-z0-9_s]+ : will match a length 1 or longer string with only characters_numbers and underscores and white space

Rule 1: Surround expression with () to designate it as a group

# ——– how to use wildcard character in teh middle dict_type = “global_parameters” algorithm = “split” data_type = “h01” search_string = fr”{dict_type}.*{algorithm}.*{data_type}” test_str = “global_parameters_hi_split_h01” #from datasci_tools import regex_utils as ru ru.match_substring_in_str(search_string,

test_str)

datasci_tools.regex_utils.all_match_substring_in_str(substring, expression)[source]
datasci_tools.regex_utils.match_pattern_in_str(string, pattern, return_one=False, verbose=False)[source]

Purpose: To find the string that matches the pattern compiled

datasci_tools.regex_utils.match_substring_in_str(substring, expression)[source]
datasci_tools.regex_utils.multiple_replace(text, dict_map=None, pattern=None)[source]

Purpose: To replace multiple strings with a dictionary apping

Ex: from datasci_tools import regex_utils as ru query = “u in [1,2,3,4]” dict_mapping = dict(u=”v”,v=”u”) ru.multiple_replace(query,dict_mapping)

datasci_tools.regex_utils.sub_str_for_pattern(s, pattern, replacement)[source]
datasci_tools.regex_utils.sub_str_for_pattern_with_count(s, pattern, replacement)[source]
datasci_tools.regex_utils.substr_from_match_obj(match_obj)[source]

datasci_tools.requirement_utils module

datasci_tools.scipy_utils module

Notes on the Delauny triangulation: 1) just makes a surface that encloses all points in a triangulation such that no point in P is inside the circumcircle of any triangle in DT(P).

The simplicies just group the vertex indices into the groups that make up the triangle Ex: array([[2, 3, 0],

[3, 1, 0]]

find_simplex –> Find the simplices containing the given points (will return the simplices index)

datasci_tools.scipy_utils.example_delaunay_triangulation(points=None, plot_triangulation=True, plot_shaded_triangulation=True, verbose=False)[source]
datasci_tools.scipy_utils.linear_regression(x, y, verbose=False)[source]
datasci_tools.scipy_utils.model_fit(y, x=None, n_downsample=10, method='poly4')[source]

datasci_tools.seaborn_utils module

class datasci_tools.seaborn_utils.SeabornFig2Grid(seaborngrid, fig, subplot_spec)[source]

Bases: object

Purpose: Allows some seabonrn plots to be subplots

source: https://stackoverflow.com/questions/35042255/how-to-plot-multiple-seaborn-jointplot-in-subplot

__init__(seaborngrid, fig, subplot_spec)[source]
datasci_tools.seaborn_utils.example_SeabornFig2Grid()[source]
datasci_tools.seaborn_utils.example_gridspec_from_existing_ax()[source]

Pseudocode: 1) Create the figure 2) Great the gripspec 3) Use the seabornfig2grid to add the ax to figure and reference the certain gripspec

datasci_tools.setup_py_utils module

datasci_tools.statistics_utils module

datasci_tools.statistics_visualizations module

datasci_tools.string_utils module

datasci_tools.system_utils module

datasci_tools.tqdm_utils module

To provide a tqdm that can be controlled

Reference Article: https://github.com/tqdm/tqdm/issues/619

datasci_tools.tqdm_utils.practice_tqdm(n=10000)[source]
datasci_tools.tqdm_utils.tqdm

alias of _TQDM

datasci_tools.tqdm_utils.turn_off_tqdm()[source]
datasci_tools.tqdm_utils.turn_on_tqdm()[source]

datasci_tools.typeing_utils module

datasci_tools.typeing_utils.foo(client_id: str) list | bool[source]
datasci_tools.typeing_utils.headline(text: str, centered: bool = False) str[source]

datasci_tools.widget_utils module

datasci_tools.widget_utils.add_on_change_func(w, f, out=None)[source]
datasci_tools.widget_utils.boolean_widgets = ['ToggleButton', 'CheckBox', 'Valid', 'Select', 'SelectionSlider', 'SelectionRangeSlider', 'ToggleButtons', 'SelectMutliple']

For options you can set the string label or put label value pairs

Ex: widgets.Dropdown(

options=[(‘One’, 1), (‘Two’, 2), (‘Three’, 3)], value=2, description=’Number:’,

)

datasci_tools.widget_utils.clear_output(w, wait=False)[source]

Wait will wait until the next time something is sent to it

datasci_tools.widget_utils.color_picker_widgets = ['ColorPicker']

Notes: Each container has a children attributes that can be set

datasci_tools.widget_utils.example_accordian()[source]
datasci_tools.widget_utils.example_add_function_to_button()[source]

Psuedocode: 1) Create button and output 2) Display Button and Output 3) Create a function that brings to the output 4) Adds the function to button widget on .on_click funct

datasci_tools.widget_utils.example_add_function_to_value_change()[source]

Purpose: The function that runs on the change should have an argument that accepts a dictionary holding the information of the change

datasci_tools.widget_utils.example_animation()[source]
datasci_tools.widget_utils.example_color_picker()[source]
datasci_tools.widget_utils.example_display_youtube_video(link='eWzY2nGfkXk')[source]
datasci_tools.widget_utils.example_function_to_output_widget(w=None, clear_output=True, wait=True)[source]
datasci_tools.widget_utils.example_image(filepath)[source]
datasci_tools.widget_utils.example_label_widget()[source]

Use label when want to create a custom label to use in a widget

datasci_tools.widget_utils.example_linking_interactive_function_with_output()[source]

Pseudocode: 1) Create slider 2) Creates function to output string of sliders 3) Creates output widget 4) Stack sliders and output widget in one widget

datasci_tools.widget_utils.example_tabs()[source]
datasci_tools.widget_utils.selection_widgets = ['Dropdown', 'RadioButtons']
datasci_tools.widget_utils.set_label(w, label)[source]

Module contents