API Reference ============= Droplet-Film Model Development Project — Technical Documentation. Overview -------- This document provides technical documentation for classes, methods, and parameters in the DFT Development project. The API is designed to be both powerful and user-friendly, supporting research and industrial applications. The project follows object-oriented design with clear separation between physics modeling, data management, and machine learning components. Core Classes and Modules ------------------------- The project consists of several key modules: - **dft_model.py**: Core physics model implementation - **utils.py**: Data management and utility functions - Individual Jupyter notebooks for different approaches DFT Class — Core Physics Model ------------------------------ The DFT class implements the Droplet-Film Model for predicting critical flow rates in gas wells. Class definition ~~~~~~~~~~~~~~~~ .. code-block:: python class DFT: """ Droplet-Film Model for predicting critical flow rates in gas wells. This class implements a physics-informed machine learning approach that combines fundamental fluid dynamics principles with data-driven optimization to predict when gas wells will experience liquid loading. """ Constructor ~~~~~~~~~~~ .. code-block:: python __init__(self, seed=42, feature_tol=1.0, dev_tol=1e-3, multiple_dev_policy="max") **Parameters:** - **seed** (int): Random seed for reproducibility. Default: 42 - **feature_tol** (float): Feature distance threshold for matching. Default: 1.0 - **dev_tol** (float): Deviation tolerance for angle matching. Default: 1e-3 - **multiple_dev_policy** (str): Policy for handling multiple matches. Options: "max", "min", "mean", "median". Default: "max" **Attributes:** seed, feature_tol, dev_tol, multiple_dev_policy, opt_params (set after fitting), n_train (set after fitting). Methods — fit ~~~~~~~~~~~~~ .. code-block:: python fit(self, X, y) Train the DFT model on provided data. **Parameters:** X (np.ndarray) shape (n_samples, 10), y (np.ndarray) shape (n_samples,). **Returns:** self. **Features (in order):** Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness. **Implementation:** Uses Powell optimization from scipy.optimize; optimizes 5 global parameters (p1–p5) plus alpha per sample; bounds alpha in [0, 1]; max 5000 iterations, 10000 function calls. Methods — predict ~~~~~~~~~~~~~~~~~ .. code-block:: python predict(self, X, dev_train=None, alpha_strategy='enhanced_dev_based') Make predictions on new data. **Parameters:** X (np.ndarray), optional dev_train, alpha_strategy (must be 'enhanced_dev_based'). **Returns:** np.ndarray of shape (n_samples,). **Alpha assignment strategy (by well deviation angle):** 1. Dev < 10°: Regular deviation-based matching - Find training samples within dev_tol - Apply multiple_dev_policy if multiple matches - Use mean training alpha if no matches 2. 10° ≤ Dev < 20°: Minimum alpha strategy - Find training samples within dev_tol - Use minimum alpha among matches - Use mean training alpha if no matches 3. Dev ≥ 20°: Full-feature matching - Compute Euclidean distance to all training samples - Use closest sample's alpha if distance < feature_tol - Use mean training alpha otherwise Methods — _eq (physics equation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python _eq(self, params, X) Compute predicted values using the physics equation. **Physics equation:** .. math:: Q_{cr} = p_1 \\sqrt{\\left| \\mathrm{term}_1 \\cdot \\alpha + (1-\\alpha) \\cdot \\mathrm{term}_2 \\right| \\cdot (1/z) \\cdot (P/T) } Where: - term1 involves :math:`2 g \\, \\mathrm{Dia}`, :math:`(\\rho_l - \\rho_g)`, :math:`\\cos(\\mathrm{Dev})`, and parameters p4. - term2 involves :math:`|\\sin(p_5 \\cdot \\mathrm{Dev})|^{p_3}` and :math:`(\\rho_l - \\rho_g)^{p_2} / \\rho_g^2`. Methods — _loss ~~~~~~~~~~~~~~~ .. code-block:: python _loss(self, params) Compute loss function for optimization. **Returns:** float (MSE). Helm Class — Data Management ----------------------------- The **Helm** class (in ``models.utils``) handles dataset loading, train/test splitting, scaling, and model training/evaluation. The CSV must include the feature columns, plus **Qcr** (target), **Gasflowrate**, and **Test status**. Class definition ~~~~~~~~~~~~~~~~ .. code-block:: python from models.utils import Helm Constructor ~~~~~~~~~~~ .. code-block:: python __init__(self, path, seed=42, drop_cols=None, includ_cols=None, test_size=0.20, scale=True) **Parameters:** path (str), seed (int), drop_cols, includ_cols (lists), test_size (float, default 0.20), scale (bool, default True). **Attributes (set after initialization):** X_train, X_test, y_train, y_test (numpy arrays). If scale=True, use X_train_rdy, X_test_rdy (and optionally y_train_rdy, y_test_rdy) for scaled data. Also: feature_names, scaler_X, scaler_y (when scale=True). Methods — evolv_model ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python evolv_model(self, build_model, hparam_grid, k_folds=5) Train model with hyperparameter optimization. **Returns:** best trained model. Performs grid search and k-fold cross-validation; stores predictions and metrics. QLatticeWrapper Class — Symbolic Regression ------------------------------------------- Wrapper for Feyn QLattice for automated symbolic regression with a scikit-learn compatible interface. Constructor ~~~~~~~~~~~ .. code-block:: python __init__(self, feature_tags, output_tag="Qcr", seed=42, max_complexity=10, n_epochs=10, criterion="bic") **Parameters:** feature_tags (List[str]), output_tag, seed, max_complexity, n_epochs, criterion ("bic", "aic", "r2"). **Methods:** fit(X, y), predict(X), express() (returns SymPy expression). Data Format Requirements ------------------------ Input CSV must contain the 10 features (Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness) plus **Qcr**, **Gasflowrate**, and **Test status** for Helm. Data Validation, Error Handling, Performance --------------------------------------------- Helm loads CSV and splits by stratify=loading (Test status). The API includes error handling for invalid inputs, missing columns, optimization failures, and (for QLattice) network issues. Support and Resources --------------------- GitHub repository, documentation, community forums, issue tracker, and research papers. For examples, see Usage Examples and the Jupyter notebooks.