How do I get built-in functions to use custom objects as if they were e.g. List?

Mathematica Asked by user75893 on May 9, 2021

(This is part 2 of my earlier question, Can I overload Part for specific heads (a la overloaded array subscripting)?, which was well-answered as asked.)


My goal is to create a custom object – with the basic behavior of a List, for example – that lazy-loads its data which might be much larger than physical memory, or distributed across many machines, or computed on need as part of a view, etc. (This is very loosely analogous to an mmap‘d file in e.g. C, or perhaps even to a SparseArray.) These objects will, in turn, be used as basic building blocks to construct a ~column-store database (think "a Wolfram Language implementation of ~kdb+").

In terms of "direct invocation" of code (i.e., I type the code in the front end), UpValues get me what I want to see. Now the question is:

How can I get built-in functions to recognize/use these objects?

Here’s what I have so far (which successfully uses UpValues to get me the Part behavior I want):

In[1]:= array /: Part[b_array, i_Integer] := arrayGetPart[b, i]

In[2]:= array /: Part[array[assoc_Association], s_String] := 

In[3]:= arrayGetPart[a_array, i_Integer] := 
 If[1 <= i <= a[["maxLength"]], i, Missing[]]

In[4]:= a = 
 array[<|"file" -> "a-really-big-column-file", "maxLength" -> 1000000000000|>]

Out[4]= array[<|"file" -> "a-really-big-column-file", "maxLength" -> 1000000000000|>]

In[5]:= a[["file"]]

Out[5]= "a-really-big-column-file"

In[6]:= {a[[-1]], a[[1]], a[[1001]], a[[1000000000000]], 

Out[6]= {Missing[], 1, 1001, 1000000000000, Missing[]}

So far, so good! But what about our friend Mean?

In[7]:= Mean[a]

Out[7]= Mean[
 array[<|"file" -> "a-really-big-column-file", "maxLength" -> 1000000000000|>]]

Minimally it seems I would need to implement e.g. Length[a_array] and other "fundamental accessors"….

If that’s a correct approach for Mean (where it is spelled out in the documentation that Mean[list] is equivalent to Total[list]/Length[list]), what about for other built-in Wolfram Language functions? How, in general, can I learn what primitive accessors I will need to implement, and how do I then inject those rules into the system?

Random remarks:

I am vaguely aware of (but have not used) MathLink, but it would be very much preferred to have a pure Wolfram Language implementation, even if that means some sacrifice in performance.

I have also played with the RelationalDatabase functionality, and this is broadly the direction I am heading, what with its style of "in-language" queries (albeit modeled around EntityStore). Alas, RelationalDatabase has abysmal performance and is closed in the sense of being able to add new operators to its own query language (which are then translated into ~SQL queries against the underlying database).

What I also want to avoid is having to reimplement every function that I might want to use in a query against the database – that is obviously possible, but it defeats the point of using a rich language such as Wolfram Language, and it would create a shadow/pidgin language that no one including myself would want to use. So, ideally, I’d like to get away with "a few" underlying primitive operations to do with e.g. List access….

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP