Tuesday, January 22, 2013

DB2 for z/OS Data Sharing: the Lock List Portion of the Lock Structure

With DB2's data sharing capability having been introduced back in the mid-1990s with DB2 for z/OS Version 4, a great many people -- including lots of folks whose organizations do not run DB2 in data sharing mode -- are quite familiar with the technology (for people new to this subject area, I posted, a few years ago, a "What is DB2 Data Sharing?" entry to the blog that I maintained while working as an independent DB2 consultant). Still, there are some aspects of DB2 data sharing technology that are more widely understood than others. It's recently come to my attention that among mainframe DB2 people there is a bit of a knowledge gap with regard to a particular component of a DB2 data sharing configuration. The component to which I refer here is the lock list portion of the DB2 lock structure, and with this blog entry my aim is to fill that knowledge gap to some degree.

The lock structure is of course one of three coupling facility structure types that DB2 uses to make data sharing work (the other two structure types are the Shared Communications Area and the group buffer pools). Within the lock structure you have the lock table and the lock list. The lock table is, conceptually, at work all the time in a DB2 data sharing environment. It's essentially a mechanism through which global lock contention is detected: when a process running on a member of a DB2 data sharing group (I'll call this member DB2A) wants a lock on something (e.g., a data page or row) that might be locked by a process running on another member of the group, the item on which the lock is being requested is mapped by way of a hashing algorithm to an entry in the lock table. That lock entry is checked for the presence of an incompatible lock held by a process running on another member of the group, and if that turns out to be the case (e.g., a process running on member DB2B has an X-lock on a page on which the process running on DB2A wants an S-lock) then the contention indication is sent back from the coupling facility holding the lock structure to the z/OS LPAR on which member DB2A is running. [Because there will almost certainly be many more lockable "things" in the database than entries in the lock table, some different lockable things will map to the same lock table entry; thus, contention can be falsely detected. Typically, "false contention" is the initial result for a very small percentage of global lock requests, and when this occurs the contention situation is determined to be false in a quick and efficient manner.]

Because the lock table is vitally important for the moment-by-moment operation of a DB2 data sharing group, it gets a lot of attention. By comparison, the lock list portion of the lock structure is usually not front-of-mind when people thing about DB2 data sharing. It's important, sure -- but in an insurance policy kind of way: you're glad it's there, but you'd be pleased if it were never needed. Here's what I mean by this analogy: the purpose of the lock list is to store information about X-type global locks held by members of the data sharing group on behalf of processes executing on the various members ("X-type" refers to locks associated with data-changing actions, and "global" refers to locks of which members other than the lock-acquiring member might need to be aware). In the event of the abnormal termination of a member DB2 subsystem, information about the X-type global locks held by that member at the time of failure will be, in effect, downloaded from the lock list to the surviving members of the group. These locks will be treated by the surviving members as "retained" locks, and they will continue to be retained (and the associated locked items, such as data pages and/or rows, will remain unavailable) until the failed DB2 subsystem releases the locks in the course of restart processing (restart of a failed DB2 member will usually be initiated automatically by the system, and will often complete in less than 2 minutes). Knowing about these retained locks is important for data integrity-preservation purposes: they keep surviving subsystems from accessing data pages or rows that were in the process of being changed through another member when that member failed. The retained locks "seal off" affected pages and/or rows (depending on the granularity of locking used for a table space) until the failed member can, as part of restart processing, roll back the units of work that were in-flight in the subsystem at the time of failure and write to disk those committed data changes that had not yet been externalized at the time of the failure event.

So, what does the lock list need in order to do its job? It needs enough space to hold information about all the X-type global locks held at a given time by the members of the DB2 data sharing group. Often people supporting DB2 data sharing systems don't think much about this, and usually they don't need to: these folks take the recommended approach of making the size of the DB2 lock structure a power of two (e.g., 64 MB, specified by way of the INITSIZE parameter for the structure in the system's coupling facility resource management policy, also known as the CFRM policy). When that's done, half the space in the lock structure will be used for the lock table, and half will be used for the lock list (the lock table size MUST be a power of two). In most cases, this approach provides plenty of lock list space (and that's good -- don't get stingy here).

Suppose your particular situation falls outside of the "most cases" norm? I recently encountered an interesting set of circumstances at a DB2 for z/OS site that made the "power of two" standard for lock structure size inappropriate. At this site, in a test environment, the lock list in a 327 MB lock structure (that's really big) got full. Let me put this in perspective for you: 327 MB is not a power of two. So, how big was the lock list in this lock structure? As mentioned previously, the lock table portion of a lock structure MUST be sized at a power of two. When the lock structure INITSIZE is not a power of two, upon initial structure build (or subsequent rebuild) the lock table portion of the structure will be sized at the power of two that will result in as close to a "50-50" split of space as possible between the lock table and the lock list. For a 327 MB lock structure that power of two for the lock table will be 128 MB (versus 64 MB or 256 MB). Subtract the 128 MB for the lock table from the 327 MB for the overall lock structure, and what you have left is 199 MB for the lock list. Will a lock list of that size have enough room to store a lot of global X-lock information? Yes: at a little over 300 bytes per lock list entry, a 199 MB lock list can accommodate over 600,000 list entries. That kind of capacity would typically be more than enough in a DB2 data sharing system, but in the situation to which I'm referring it fell short of the need. Why? Because at this site the migration of an application from a non-DB2 database to DB2 is requiring, at least for a time, some very large units of work (with respect to the number of update locks acquired by a program between commits). The ZPARM parameters NUMLKTS and NUMLKUS had been set to pretty high values to accommodate this requirement, but as mentioned the lock list in the test system did fill up and programs needing additional global X-locks abended as a result.

At this site, the lock list needs to be made larger (in fact, considerably larger), and that's going to happen -- but not in the way you might suppose. You might think that in this situation you'd just make the lock structure big enough (referring to INITSIZE) so that the lock list will be as big as needed, keeping in mind that, as noted, the size of the lock table part of the structure will be the power of two that gets you as close as possible to a 50-50 space split between the lock table and the lock list. In fact, that is NOT what this organization needs to do, for this simple reason: the lock table portion of the current lock structure, at 128 MB, is already plenty big and doesn't need to get any bigger (the adequacy of the lock table's size is indicated by a very low rate of false contention, which I briefly described above). The solution here is to keep the lock structure's INITSIZE as-is, and to substantially boost the value of the SIZE parameter for the structure in the CFRM policy. The SIZE value specifies the size up to which a coupling facility structure can be dynamically increased (by way of a SETXCF command) after it has been initially built (or rebuilt). Now, here's the really important point in the context of the referenced organization's need for substantially more lock list space: when a lock structure is made dynamically larger by way of a SETXCF command (and again, this can only be done if the value of SIZE exceeds the value of INITSIZE for the structure in the CFRM policy), ALL of the added space is used for the lock list -- the size of the lock table does not change (a larger INITSIZE and a structure rebuild would be needed to increase the size of the lock table). So, if INITSIZE for a lock structure is, say, 256 MB, and SIZE for the structure is 512 MB, and if the structure is initially built and then 100 MB is dynamically added to the structure's size through execution of a SETXCF command, the effect will be a 100 MB increase in the size of the lock list (which should give you room for an extra 320,000 or so additional lock list entries).

Another thing about dynamically increasing the size of the lock list: suppose, as in the example at the end of the preceding paragraph, you had a lock structure with an INITSIZE of 256 MB and a SIZE of 512 MB, and after initial structure build you dynamically increased the size of the structure by 100 MB. That would leave you with another 156 MB of available dynamic-size-increase capability. How would you know when to use that? IRLM (DB2's lock manager component) will tell you. If the lock list gets pretty full (50% full), IRLM will issue message DXR170I. That message will be issued again if the lock list becomes 60% full, and again if it gets to 70% full. If the lock list gets really full (80%), IRLM will issue message DXR142E. That message will be issued again if the lock list gets to 90% full, and again if it gets to 100% full, but you'd want to act on the 80% full message because at 90% full you're likely to start getting application program abends.

In summary: the lock list portion of the lock structure serves its purpose when a DB2 subsystem in a data sharing group fails, and that tends to be a pretty rare event. Still, you need to make sure that you have enough space in your lock list to hold all the global X-locks that might accumulate in your DB2 data sharing system, because running short on this space could result in application program abend events. In my experience, the lock list space that people have is usually plenty big for their needs, but there are situations in which the accumulation of a really large number of X-type global locks across a data sharing group can be expected. If that's the case at your site, be ready with a big-enough lock list that you can dynamically enlarge if needed. In other words, make sure that this insurance policy is in order and reflects the characteristics of your DB2 application environment.

Wednesday, January 9, 2013

DB2 for z/OS: An Interesting "Flip Side" to Native SQL Procedures

I heard the other day from a DB2 for z/OS DBA who communicated to me an interesting story. His organization had recently gotten into development and deployment of native SQL procedures on the mainframe DB2 platform. The native SQL procedures were working as advertised, but the move to utilize this DB2 technology had engendered grumbling on the part of some of the organizations' application programmers. In this entry I'll explain the basis of the developers' complaints, and I'll describe an approach that other organizations have taken to proactively address similar situations.

It's no secret that I've been bullish on DB2 for z/OS native SQL procedures since the get-go. I've done a lot of presenting and writing on the topic, starting with an entry that I posted back in 2008 to the blog I maintained while working as an independent consultant. Native SQL procedures were introduced with DB2 9 for z/OS in new-function mode (or with DB2 10 in new-function mode, for companies that migrated directly from DB2 Version 8 to DB2 10), and they deliver multiple benefits: significant processing offload to less-expensive zIIP engines on System z servers (when called by network-connected DRDA requesters), execution efficiency (they run in the DB2 DBM1 address space, versus an external-to-DB2 stored procedure address space), and simplified stored procedure management (there are no program object modules or load modules to manage -- a native SQL procedure's "executable" is its DB2 package). A lot to like.

So, what caused the aforementioned discontent among some of the application development colleagues of the DBA who shared his experience with me? In a nutshell, it was this: developing native SQL procedures ups the value of a programmer's ability to write sophisticated SQL statements, and a number of the more senior mainframe application developers at the organization to which I've referred felt that they were not prepared for this circumstance (the DBA explained that this underlying tension came to the fore when it was found that a SQL statement in a native SQL procedure performed more efficiently when it included a common table expression versus relying on a declared global temporary table).

I absolutely get this, and it very much does not reflect poorly on the skills and smarts of the programmers in question. Several of these individuals had been engaged for a long time in writing COBOL programs that included very simple SQL statements. I don't just mean simple in terms of an absence of things like common table expressions, CASE expressions, and CAST specifications. I mean simple as in not involving multi-table access via JOINs and/or UNIONs. Why is it that these folks generally wrote simple statements that didn't come close to leveraging the power of SQL? In large part -- and I've certainly seen this before -- it was because many of the programs running on the mainframe system had originally been written to access data in a non-relational DBMS and had been converted to access DB2 data. The programming interface to the non-relational DBMS was record-oriented, as opposed to being set-oriented. A program would access a record in a file, and if that record needed to be matched with one in another file then another single-record request would target that second file, using selection criteria obtained from the first retrieved record. When the database was migrated to DB2, the structure and logic of the programs that accessed the non-relational database were preserved, with very simple, single-row and single-table SQL statements taking the place of statements used to access data prior to the DB2 migration. This is not an uncommon approach for an organization to take when migrating data from a non-relational database to DB2: for the sake of expediency, existing database-accessing programs are converted to access DB2 with as little change as possible, and in terms of more fully exploiting SQL -- and DB2 for z/OS technology in general -- the focus is put on development of new applications. Consequently, you can end up with a group of programmers -- otherwise highly skilled -- who don't have much SQL knowledge beyond coding very simple statements because they haven't been asked to do more than that. Tell these folks to start developing native SQL procedures that are written entirely in SQL, and you might find that some of them feel under-prepared for that task. It's a recipe for frustration.

There are mainframe DB2-using companies that have anticipated this challenge and responded by creating a new role focused on facilitating the development and deployment of DB2 routines such as stored procedures (which could be native SQL procedures) and user-defined functions (which, starting with DB2 10 for z/OS in new-function mode, can be written using SQL Procedure Language). At one such company the name "procedural DBA" was given to the new role (at that company, some of the new "procedural DBAs" worked previously as traditional DB2 DBAs, while others were former application programmers who were keenly interested in building their SQL coding skills). The idea is to have a center of DB2 SQL excellence through which SQL coding knowledge (including development of native SQL procedures) can be diffused to the wider group of people in the organization who write programs that access DB2 data. I wrote about this new DB2-related role in an article that appeared in 2011 in IBM Data Management Magazine, and I blogged about the value to an organization of SQL coding skills in an entry posted to this blog, also in 2011.

Besides a role aimed at helping people to develop DB2 routines using SQL Procedure Language, make sure that programmers have access to helpful documentation. SQL Procedure Language statements (including statements that control logic flow in a routine) are documented in the DB2 for z/OS SQL Reference for DB2 9 and DB2 10. Information about creating native SQL procedures can be found in the DB2 for z/OS Application Programming and SQL Guide for DB2 9 and DB2 10. Another very useful source of information is the IBM "red book" titled, "DB2 9 for z/OS Stored Procedures: Through the CALL and Beyond." And, make sure that people are aware of forums through which DB2 people help other DB2 people with all kinds of questions, including those pertaining to native SQL procedure development. A personal favorite of mine is the DB2-L forum sponsored by the International DB2 Users Group (just go to www.idug.org, request -- for FREE -- a login if you don't already have a member ID, and click on the "DB2-L & Forums" link near the top of the IDUG home page).

SQL Procedure Language (SQL PL, for short) is important and getting more so in a DB2 for z/OS context (as previously mentioned, SQL PL can be used to create user-defined functions in a DB2 10 new-function mode environment, and SQL PL got a nice performance boost with DB2 10). People in your organization who have not heretofore written routines entirely in SQL can do so, effectively and efficiently, but that ability doesn't magically arise. Let people know that SQL coding help is available -- through SQL-knowledgeable individuals in your organization, in the DB2 documentation, and via DB2 forums -- and get ready to be impressed by what they can do. People like to learn new things. Give them that opportunity.