TransWikia.com

PostGIS Clustering with MAX, MIN and SUM Limit of Attribute

Geographic Information Systems Asked on May 30, 2021

I have point data with number of peoples living in a house.

wkt111,members
SRID=4326;POINT(8.40771254932301 52.8935196103868),158
SRID=4326;POINT(8.35609033091261 52.8948388844689),122
SRID=4326;POINT(8.43633740614695 52.9290469721589),144
SRID=4326;POINT(8.43637376596481 52.9282851939852),144
SRID=4326;POINT(8.35612742505646 52.8953039259067),122
SRID=4326;POINT(8.40722533514059 52.8941621951891),158
SRID=4326;POINT(8.43974835467466 52.9295371710363),106
SRID=4326;POINT(8.4118531995535 52.893081792927),158
SRID=4326;POINT(8.4361907632209 52.9299321213647),144
SRID=4326;POINT(8.43860541290836 52.9287882893147),122
SRID=4326;POINT(8.43930763983385 52.929938063062),106
SRID=4326;POINT(8.43640308495477 52.92761686129),144
SRID=4326;POINT(8.43761323022107 52.9294136825416),122
SRID=4326;POINT(8.35565716705095 52.8947943141241),122
SRID=4326;POINT(8.40756478326444 52.8928834796089),116
SRID=4326;POINT(8.41197915975963 52.8922304811318),158
SRID=4326;POINT(8.35575141592804 52.8952636611962),122
SRID=4326;POINT(8.41199100509697 52.8928796332762),158
SRID=4326;POINT(8.35700296469407 52.8946903270974),122
SRID=4326;POINT(8.4081051389381 52.8927128902747),158
SRID=4326;POINT(8.40759193225009 52.8930786154292),116
SRID=4326;POINT(8.44019422628302 52.9295509228059),106
SRID=4326;POINT(8.41188170626939 52.8928475154659),158
SRID=4326;POINT(8.3562215746048 52.8948080186652),122
SRID=4326;POINT(8.43857527294205 52.929880156525),106
SRID=4326;POINT(8.4077242541519 52.8941146973073),158
SRID=4326;POINT(8.40775467492524 52.8922823085847),116
SRID=4326;POINT(8.43642552740359 52.927022173622),144
SRID=4326;POINT(8.35657574880137 52.8951443234909),122
SRID=4326;POINT(8.40848705845392 52.8920713315788),158
SRID=4326;POINT(8.40776048054732 52.893703405996),158
SRID=4326;POINT(8.43636411318751 52.9284504349415),144
SRID=4326;POINT(8.40741653275869 52.8934191858778),158
SRID=4326;POINT(8.35617545454214 52.8950776214254),122
SRID=4326;POINT(8.40823763848918 52.8923667872611),158
SRID=4326;POINT(8.43639461217891 52.9278189867993),144
SRID=4326;POINT(8.43853189619463 52.9295519446821),106
SRID=4326;POINT(8.40849838008488 52.8918852920387),158
SRID=4326;POINT(8.4389839529149 52.9295506923135),106
SRID=4326;POINT(8.40799108038239 52.8928973200872),158
SRID=4326;POINT(8.4381127659551 52.92906558007),122
SRID=4326;POINT(8.41209469177733 52.8922374887863),158
SRID=4326;POINT(8.40816140613239 52.8925378483298),158
SRID=4326;POINT(8.43985303594237 52.9299617157145),106
SRID=4326;POINT(8.3556053817816 52.8950242116499),122
SRID=4326;POINT(8.40793700015185 52.8930687863224),158
SRID=4326;POINT(8.38921628504428 52.8925246492037),112
SRID=4326;POINT(8.39473176195843 52.891683086891),112
SRID=4326;POINT(8.40739093896853 52.8944262688579),158
SRID=4326;POINT(8.40832773783288 52.8922089171594),158
SRID=4326;POINT(8.43937823057937 52.9295592639623),106
SRID=4326;POINT(8.44026594300409 52.9298270972058),106
SRID=4326;POINT(8.40735356098727 52.8935752290805),158
SRID=4326;POINT(8.43940727028956 52.9299369614778),106
SRID=4326;POINT(8.43634853799564 52.9286309912991),144
SRID=4326;POINT(8.35706977648743 52.8949976908418),122
SRID=4326;POINT(8.40775811103838 52.8938854805683),158
SRID=4326;POINT(8.39161917424377 52.8921613436314),112
SRID=4326;POINT(8.40780947454812 52.8944071474669),158
SRID=4326;POINT(8.43702951344901 52.9298314634418),122
SRID=4326;POINT(8.4375248181906 52.9295802736219),122
SRID=4326;POINT(8.43631696721841 52.9295696636839),144
SRID=4326;POINT(8.43628059188016 52.929764549172),144
SRID=4326;POINT(8.40727075744272 52.8939971673637),158
SRID=4326;POINT(8.4382991417383 52.9297268266898),106
SRID=4326;POINT(8.43633560151942 52.9288448256907),144
SRID=4326;POINT(8.43895687449178 52.9298073175027),106
SRID=4326;POINT(8.39273216869533 52.8920984174851),112
SRID=4326;POINT(8.43971473681796 52.9299635281786),106
SRID=4326;POINT(8.43799059571948 52.9298995849497),106
SRID=4326;POINT(8.43723205021406 52.9296960635931),122
SRID=4326;POINT(8.4075824477187 52.8928286205),116
SRID=4326;POINT(8.39351114085647 52.8916949848888),112
SRID=4326;POINT(8.41177354519619 52.8925962489103),158
SRID=4326;POINT(8.43834504310234 52.9289302986786),122
SRID=4326;POINT(8.44067887618606 52.9295576528102),106
SRID=4326;POINT(8.4363750768028 52.9280731928726),144
SRID=4326;POINT(8.40801362256092 52.8920202025446),116
SRID=4326;POINT(8.43632109984682 52.9292300058607),144
SRID=4326;POINT(8.40785381832656 52.8925270899493),116
SRID=4326;POINT(8.41171402780864 52.893658373826),158
SRID=4326;POINT(8.39014657911192 52.8922392583656),112
SRID=4326;POINT(8.35652068045738 52.8947780314362),122
SRID=4326;POINT(8.40701005464346 52.8943587912332),158
SRID=4326;POINT(8.40735082328216 52.8937746314017),158
SRID=4326;POINT(8.43640705432114 52.9274074791734),144
SRID=4326;POINT(8.41183287734992 52.8931357408673),158
SRID=4326;POINT(8.35682712053401 52.8944970640479),122
SRID=4326;POINT(8.4363201037938 52.9294231689194),144
SRID=4326;POINT(8.44057979522546 52.9297512515431),106
SRID=4326;POINT(8.39100985742862 52.8922600252125),112
SRID=4326;POINT(8.4378599136052 52.9293060812469),122
SRID=4326;POINT(8.41171179591219 52.8935838674255),158
SRID=4326;POINT(8.44105081209008 52.9298855598048),106
SRID=4326;POINT(8.43643577760023 52.9267984844396),144
SRID=4326;POINT(8.4120753326162 52.8926832890201),158
SRID=4326;POINT(8.40762272122407 52.8927048328018),116
SRID=4326;POINT(8.4119304894701 52.8924288327541),158
SRID=4326;POINT(8.43638879140181 52.9272268727532),144
  1. cluster within 2km.
  2. cluster should have min 100 or max 1000 of peoples/members. if exceeded the amount create a new cluster.
  3. sub-group should be a cluster if meet number 2.
    Note: there may be more than 1 sub-cluster in the cluster.

I have used the clusterDBscan, but not sure how to make a cluster with max and min sum of peoples.

2 Answers

I have tried some different methods this one gives me near results but not fully accurate.1

DROP FUNCTION IF EXISTS public.I_Cluster(int4, int4, int4, int4);
CREATE OR REPLACE FUNCTION "public"."i_cluster"("distance" int4, "household_min" int4,
"min_threshold" int4, "max_threshold" int4)

RETURNS TABLE("house" varchar, "cluster_id" int4) AS $BODY$ DECLARE
counter_cluster INTEGER := 1;
counter INTEGER := 0;
cluster_array CHARACTER VARYING [];
cluster_id INTEGER [];
cluster_next_loop_value CHARACTER VARYING;
total INTEGER := 0;
rec RECORD;
rec1 RECORD;
BEGIN
    FOR rec IN WITH res AS (
    SELECT
        A.haushalte,
        C.NAME,
        C.geom 
    FROM
        cluster_final A,
        house_polygons b,
        housecoordinates_final C 
    WHERE
        C."name" = b."coordname" 
        AND b."clustername" = A."name" 
        AND A.haushalte :: int4 > household_min
    ) SELECT A.*, st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) ),
    b.NAME name1, b.haushalte haushalte1 
FROM
    res A,
    res b 
WHERE
    st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) ) < distance 
ORDER BY
    A.NAME,
    st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) ),
    b.NAME
LOOP
IF counter = 0 THEN
        cluster_array := ARRAY_APPEND( cluster_array, rec.name1 );
        cluster_id := ARRAY_APPEND( cluster_id, counter_cluster );
        total := total + rec.haushalte1 :: INTEGER;
    
END IF;
IF rec.name1 = ANY ( cluster_array ) THEN
        ELSE
    IF total < max_threshold THEN
        
            cluster_id := ARRAY_APPEND( cluster_id, counter_cluster );
            cluster_array := ARRAY_APPEND( cluster_array, rec.name1 );
            total := total + rec.haushalte1 :: INTEGER;
            --RAISE NOTICE'Cluster Value %', rec.name1;
            --RAISE NOTICE'Total %', total;

    ELSIF total > min_threshold THEN
            total := 0;
            counter_cluster := counter_cluster + 1;
            cluster_next_loop_value := rec.name1;
            --RAISE NOTICE'Second Loop %', cluster_next_loop_value;
            --RAISE NOTICE'Total %', total;
            FOR rec1 IN WITH res AS (SELECT
                    A.haushalte, C.NAME, C.geom 
                FROM
                    cluster_final A, house_polygons b, housecoordinates_final C 
                WHERE
                    C."name" = b."coordname" 
                    AND b."clustername" = A."name" 
                    AND A.haushalte :: int4 > household_min 
                ) 
                SELECT A.*,
                st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) )::TEXT,
                b.NAME name1,
                b.haushalte haushalte1 
            FROM
                res A, res b 
            WHERE
                st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) ) < distance 
            ORDER BY A.NAME
                , st_distance ( st_transform ( A.geom, 3857 ), st_transform ( b.geom, 3857 ) )
                , b.name
        LOOP
            IF cluster_next_loop_value=rec1.name1 THEN
            IF rec1.name = ANY ( cluster_array ) THEN
                ELSE
                total := total + rec1.haushalte1::int8;

                IF total < max_threshold THEN
                      cluster_id := ARRAY_APPEND( cluster_id, counter_cluster );
                      cluster_array := ARRAY_APPEND( cluster_array, rec1.name1);
ELSIF total > max_threshold THEN
                    total := 0;
                    counter_cluster := counter_cluster + 1;
                    END IF;
                END IF;
            END IF;
        END LOOP;
ELSIF total <= min_threshold THEN
    total := 0;
    END IF;
    END IF;
    counter := counter + 1;
    
END LOOP;
RETURN query SELECT UNNEST( cluster_array ), UNNEST ( cluster_id );
END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100 ROWS 1000;
select row_number() over() id, * from i_cluster(2000, 100, 100, 1000) as a, housecoordinates_final b where a.house=b.name;

Correct answer by Muhammad Imran Siddique on May 30, 2021

To make your spatial table valid I suggest you to add an id column. In addition, I have added a cluster column which will store the clusters. Below is the table create query.

CREATE TABLE houses (
    id integer NOT NULL DEFAULT nextval('houses_id_seq'::regclass),
     members integer,
    geom GEOMETRY(POINT, 4326),
    CONSTRAINT houses_pkey PRIMARY KEY (id)
)

I added id column and added consecutive integers (1,2,3...), renamed wkt111 to geom. I made sure the order of columns in the DB table and csv table are the same. Then I imported your data using:

COPY houses FROM '/path/to/your/csvfile.csv';

Using this query below, the points are clustered by distance using ST_ClusterDBSCAN. And the geographic coordinate system (4326) is transformed to local projected coordinate system (4839) that uses meters, which you can change to any other.

SELECT *, ST_ClusterDBSCAN(ST_Transform(geom,4839), eps := 2000, minpoints := 1) OVER() AS cl_id FROM houses ORDER BY cl_id 

The loop basically uses the above result to further cluster the points based on the members count.

The maximum - 1000 members is enforced. The minimum 100 members can be enforced when you suggest where they should go.

The following anonymous function updates the cluster column with cluster numbers:

DO $$
  DECLARE
    arow record;
    counter int := 0;
    curr_cluster int := 0;
    previous_cluster int := 0;
  BEGIN
    FOR arow IN 
      SELECT *, ST_ClusterDBSCAN(ST_Transform(geom,4839), eps := 2000, minpoints := 1)
        OVER() AS cl_id FROM houses ORDER BY cl_id 
    LOOP
     --- It means this is a new cluster 
    IF  previous_cluster <> arow.cl_id THEN 
        -- Increment by one as it is a new cluster
        curr_cluster = curr_cluster + 1;  
        counter := arow.members; -- reset members counter
     --- looping in the same original cluster group so increment members normally
     ELSIF previous_cluster = arow.cl_id THEN 
        counter := counter + arow.members;
    
    END IF;
       previous_cluster := arow.cl_id;

      IF counter < 1001 THEN
    
        UPDATE houses SET cluster = curr_cluster WHERE id=arow.id;
    
      ELSIF counter > 1000 THEN
          counter := arow.members; -- reset members counter
          -- increment current clusterr members should be below 1000
          curr_cluster = curr_cluster + 1; 
         UPDATE houses SET cluster = curr_cluster WHERE id=arow.id;
       END IF;
     
     END LOOP;
 
  END;
$$;

Finally, you can style the points based on the cluster column by setting different colors for each cluster.

Answered by wondim on May 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP