Array Functions
Functions for working with arrays and lists, including operations like distance calculations and array manipulation.
array
contains
Checks whether the array contains the specified element.
Parameters:
-
arr
(str | Column | Func | Sequence
) βArray to check for the element. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
-
elem
(Any
) βElement to check for in the array.
Returns:
-
Func
(Func
) βA
Func
object that represents the contains function. Result of the function will be1
if the element is present in the array, and0
otherwise.
Example
Notes
- The result column will always be of type int.
Source code in datachain/func/array.py
cosine_distance
Returns the cosine distance between two vectors.
The cosine distance is derived from the cosine similarity, which measures the angle between two vectors. This function returns the dissimilarity between the vectors, where 0 indicates identical vectors and values closer to 1 indicate higher dissimilarity.
Parameters:
-
args
(str | Column | Func | Sequence
, default:()
) βTwo vectors to compute the cosine distance between. If a string is provided, it is assumed to be the name of the column vector. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be a vector of values.
Returns:
-
Func
(Func
) βA
Func
object that represents the cosine_distance function.
Example
Notes
- Ensure both vectors have the same number of elements.
- The result column will always be of type float.
Source code in datachain/func/array.py
euclidean_distance
Returns the Euclidean distance between two vectors.
The Euclidean distance is the straight-line distance between two points in Euclidean space. This function returns the distance between the two vectors.
Parameters:
-
args
(str | Column | Func | Sequence
, default:()
) βTwo vectors to compute the Euclidean distance between. If a string is provided, it is assumed to be the name of the column vector. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be a vector of values.
Returns:
-
Func
(Func
) βA
Func
object that represents the euclidean_distance function.
Example
Notes
- Ensure both vectors have the same number of elements.
- The result column will always be of type float.
Source code in datachain/func/array.py
get_element
Returns the element at the given index from the array. If the index is out of bounds, it returns None or columns default value.
Parameters:
-
arg
(str | Column | Func | Sequence
) βArray to get the element from. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
-
index
(int
) βIndex of the element to get from the array.
Returns:
-
Func
(Func
) βA
Func
object that represents the array get_element function.
Example
Notes
- The result column will always be the same type as the elements of the array.
Source code in datachain/func/array.py
join
Returns a string that is the concatenation of the elements of the array.
Parameters:
-
arr
(str | Column | Func | Sequence
) βArray to join. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
-
sep
(str
, default:''
) βSeparator to use for the concatenation. Default is an empty string.
Returns:
-
Func
(Func
) βA
Func
object that represents the join function.
Example
Notes
- The result column will always be of type string.
Source code in datachain/func/array.py
length
Returns the length of the array.
Parameters:
-
arg
(str | Column | Func | Sequence
) βArray to compute the length of. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
Returns:
-
Func
(Func
) βA
Func
object that represents the array length function.
Example
Notes
- The result column will always be of type int.
Source code in datachain/func/array.py
sip_hash_64
Returns the SipHash-64 hash of the array.
Parameters:
-
arg
(str | Column | Func | Sequence
) βArray to compute the SipHash-64 hash of. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
Returns:
-
Func
(Func
) βA
Func
object that represents the sip_hash_64 function.
Example
Note
- This function is only available for the ClickHouse warehouse.
- The result column will always be of type int.
Source code in datachain/func/array.py
slice
slice(
arr: Union[str, Column, Func, Sequence],
offset: int,
length: Optional[int] = None,
) -> Func
Returns a slice of the array starting from the specified offset.
Parameters:
-
arr
(str | Column | Func | Sequence
) βArray to slice. If a string is provided, it is assumed to be the name of the array column. If a Column is provided, it is assumed to be an array column. If a Func is provided, it is assumed to be a function returning an array. If a sequence is provided, it is assumed to be an array of values.
-
offset
(int
) βStarting position of the slice (0-based).
-
length
(int
, default:None
) βNumber of elements to include in the slice. If not provided, returns all elements from offset to the end.
Returns:
-
Func
(Func
) βA
Func
object that represents the slice function.
Example
Notes
- The result column will be of type array with the same element type as the input.