While working on SQL Challenge #19, the challenge's writers provided a small sample data set of just 10 records. Purportedly a larger data set existed too, but I couldn't find the larger data set in the Wayback Machine (where I had to go in order to find the actual Challenge description dated 2009). I wanted a larger set in order to test the scalability of various queries.


 


So I had to generate my own hierarchy.  As usual, I had to thrash around a bit, but my data-generator seemed to work just fine.


 


However, I noticed in retrospect that the generated hierarchical data set was "complete", not "ragged." In other words, all the leaf-nodes were at the bottom two levels of the hierarchy. A realistic organizational hierarchy would be "ragged": the CEO could have some direct reports, which in turn managed no one -- these would probably be trusted advisors, consultants, or staff positions (positions much as I'd like to have: highly -paid, -regarded, but with no "line" responsibilities... :)


 


I think I've succeeded (regarding the data generation, not the sinecure). The following query generates a hierarchy. The top node (the "CEO", if you will) gets a full span-of-control, but everyone underneath has a 20% chance of having no subordinates ("20%" is an approximate statement, not a formal probability). 


 


I bolded the query change from last time, and specifically list the "skipped" nodes (members of the organization without direct reports):


 


WITH

--------------------------------------------------------------------------------
-- Parameters (single record)
--------------------------------------------------------------------------------
parm AS (
SELECT 100 AS NewEmployeeCnt -- start with zero employees, then count
, 3 AS NewSubordinateCnt -- start with zero per manager, then count
FROM DUAL
),------------------------------------------------------------------------------
-- employees and hierarchy. PK = employeeID
--------------------------------------------------------------------------------
emp_hier ( employeeID, EmployeeName, ReportsTo
, NewSubordinateCnt, NewEmployeeCnt, MAXSubCnt
) AS (
SELECT 1 AS employeeID
, TO_CHAR( to_date(1,'J'),'Jsp') AS EmployeeName
, 1 AS ReportsTo -- selfish at the top
, parm.NewSubordinateCnt
, parm.NewEmployeeCnt
, parm.NewSubordinateCnt AS MAXSubCnt
FROM parm
UNION ALL
SELECT emp_hier.employeeID + 1
, TO_CHAR( to_date(emp_hier.employeeID + 1,'J'),'Jsp') AS EmployeeName
, CASE WHEN NewSubordinateCnt > 1
THEN emp_hier.ReportsTo
ELSE emp_hier.ReportsTo + 1
+ CASE WHEN DBMS_RANDOM.VALUE > 0.2
THEN 0 ELSE 1 -- 20% ragged
END
END AS ReportsTo
, CASE WHEN NewSubordinateCnt > 1
THEN NewSubordinateCnt - 1
ELSE ROUND(DBMS_RANDOM.VALUE*emp_hier.MAXSubCnt,0)
END AS NewSubordinateCnt
, NewEmployeeCnt - 1 AS NewEmployeeCnt
, emp_hier.MAXSubCnt
FROM emp_hier
WHERE NewEmployeeCnt >= 1
) ------------------------------------------------------------------------------
-- Pretty-up the results
--------------------------------------------------------------------------------
SELECT employeeID
, SUBSTR(EmployeeName,1,1) -- AS FirstName (initial)
||'. '
||InitCap(SUBSTR(EmployeeName,2,999)) -- AS LastName
AS EmployeeName
, CASE WHEN ReportsTo = employeeID
THEN NULL ELSE ReportsTo
END AS ReportsTo
, CASE WHEN ReportsTo
-LAG(ReportsTo)OVER(ORDER BY employeeID) > 1
THEN ReportsTo-1 END AS jumped
FROM emp_hier
ORDER BY 1


 


The first part of the results when I ran it are as follows:


 


EMPLOYEEID EMPLOYEENAME     REPORTSTO     JUMPED

---------- --------------- ---------- ----------
1 O. Ne
2 T. Wo 1
3 T. Hree 1
4 F. Our 2
5 F. Ive 3
6 S. Ix 3
7 S. Even 4
8 E. Ight 4
9 N. Ine 6 5
10 T. En 6
11 E. Leven 8 7
12 T. Welve 9
13 T. Hirteen 9
14 F. Ourteen 11 10
15 F. Ifteen 11
16 S. Ixteen 12
17 S. Eventeen 13
18 E. Ighteen 13
19 N. Ineteen 13
20 T. Wenty 14
21 T. Wenty-One 15
22 T. Wenty-Two 15
23 T. Wenty-Three 16
24 T. Wenty-Four 17
25 T. Wenty-Five 18
26 T. Wenty-Six 18
27 T. Wenty-Seven 19
28 T. Wenty-Eight 19
29 T. Wenty-Nine 20
30 T. Hirty 20
31 T. Hirty-One 20
32 T. Hirty-Two 21
33 T. Hirty-Three 22
34 T. Hirty-Four 22
35 T. Hirty-Five 23
36 T. Hirty-Six 23
37 T. Hirty-Seven 24
38 T. Hirty-Eight 24
39 T. Hirty-Nine 25
40 F. Orty 25
41 F. Orty-One 26
42 F. Orty-Two 26
43 F. Orty-Three 27
44 F. Orty-Four 29 28
...


 


I'm happier. As you can verify, employees/nodes 5, 7, 10 and 28 have no subordinates.


 


<END>


 


Here's a summary of another run, where you can see there are "leaves" (employees without direct reports) scattered throughout the hierarchy.


 



...
select depth#
, sum(case when leafer = 0 then 1 end) as interior
, sum(case when leafer = 1 then 1 end) as leaf
, count(*) as node_cnt
from (
select r.*
, LEVEL as depth#
, CONNECT_BY_ISLEAF AS Leafer
from results r
start with Reportsto IS NULL
connect by prior employeeID = ReportsTo
) group by depth# order by 1,2
SQL> /

DEPTH# INTERIOR LEAF NODE_CNT
------ ---------- ---------- ----------
1 1 1
2 2 2
3 2 1 3
4 4 4
5 7 7
6 5 2 7
7 6 2 8
8 8 2 10
9 8 3 11
10 10 4 14
11 8 11 19
12 15 15


 


Looks good.


 






 


The image is my rendering of the SQL Challenge 19 small-sample data set.