[Koha-devel] Formatting control number searches

Barton Chittenden barton at bywatersolutions.com
Fri Jul 27 23:11:13 CEST 2018


Just for reference, I figured how how to do the OCLC number reformatting in
XSLT.

If we have a file 'format_oclc.marcxml':

<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="w">(OCoLC)2776588</subfield>
    <subfield code="w">(OCoLC)12776588</subfield>
    <subfield code="w">(OCoLC)112776588</subfield>
  </datafield>
</record>

And an xslt file 'format_oclc.xslt':

<!DOCTYPE stylesheet >
<xsl:stylesheet version="1.0" xmlns:marc="http://www.loc.gov/MARC21/slim"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

    <xsl:template name="format_OCLC_number">
        <xsl:param name="controlnumber"/>
        <xsl:variable name="OCLC_number"
select="substring-after($controlnumber,'(OCoLC)')"/>
        <xsl:variable name="OCLC_length"
select="string-length($OCLC_number)"/>
        <xsl:if test="$OCLC_number">
            <xsl:choose>
                <xsl:when test="$OCLC_length < 8">
                    <xsl:variable name="OCLC_number_padding"
select="substring( '00000000', 1, 8 - $OCLC_length)"/>
                    <xsl:variable name="formatted_OCLC_number"
select="concat( $OCLC_number_padding, $OCLC_number )"/>
                    <xsl:value-of select="concat( 'ocm',
$formatted_OCLC_number )" />
                </xsl:when>
                <xsl:when test="$OCLC_length = 8">
                    <xsl:value-of select="concat( 'ocn', $OCLC_number )" />
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="concat( 'on', $OCLC_number )" />
                </xsl:otherwise>
            </xsl:choose>
        </xsl:if>
    </xsl:template>

    <xsl:template match="marc:record">
        <xsl:for-each select="marc:datafield[@tag=773]">
            <xsl:for-each select="current()/marc:subfield[@code='w']">
                <xsl:call-template name="format_OCLC_number">
                    <xsl:with-param name="controlnumber"
select="current()"/>
                </xsl:call-template>
                <xsl:value-of select="'
                '" />
            </xsl:for-each>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

we can run xsltproc:

$ xsltproc format_oclc.xslt format_oclc.marcxml
<?xml version="1.0"?>
ocm02776588                 ocn12776588                 on112776588


I'll be filing a bug/patch relatively soon that incorporates this.


On Sat, May 26, 2018 at 4:58 AM, Katrin Fischer <katrin.fischer.83 at web.de>
wrote:

> I don't know about other sources, but in Germany I've never encountered
> data where the numbers in $w minus MarcOrgCode don't match the one in 001.
> Maybe this problem is just specific to OCLC?
>
> Katrin
>
> On 25.05.2018 20:06, Barton Chittenden wrote:
>
>
>
> On Thu, May 24, 2018 at 3:34 PM, Katrin Fischer <katrin.fischer.83 at web.de>
> wrote:
>
>> Hi Barton,
>>
>> Control-number is the index on 001. 001 should have the number and 003
>> the MarcOrgCode, that's why it's stripped from $w for search. I don't know
>> about OCLCs practices, so can't tell how numbers are handled there. The
>> examples here show a number with ocm in 001:
>>
>> http://www.loc.gov/marc/bibliographic/bd001.html
>>
> From that link:
>
> Contains the control number assigned by the organization creating, using,
>> or distributing the record. For interchange purposes, documentation of the
>> structure of the control number and input conventions should be provided to
>> exchange partners by the organization initiating the interchange.
>
>
> That potentially means that we would have to write XSLT to transform the
> links in $w for each 'exchange partner' -- i.e. test the Marc Org Code,
> then apply a bunch of rules to generate a value that we can search for.
>
> The examples don't leave me brimming with confidence that most exchange
> partners will use the same format for $w (after the Org Code) as for the
> 001:
>
>
>
>
>
>
>
>
>
>
>
> *001* #880524405##
> 003 CaOONL
>
>
>
>
>
>
>
>
>
>
>
>
> *001* ###86104385#
> 003 DLC
>
>
>
>
>
>
>
>
>
>
>
>
> *001* ocm14919759
> 003 OCoLC
>
>
>
>
>
>
>
>
>
>
> *001* #####9007496
> 003 DNLM
>
>
> The description for $w (http://www.loc.gov/marc/bibli
>> ographic/bd76x78x.html) doesn't have a matching example:
>>
>> "System control number of the related record preceded by the MARC code,
>> enclosed in parentheses, for the agency to which the control number
>> applies."
>>
>> Hope this helps,
>>
>> Katrin
>>
> Well, at the very least, it lets me know what I'm getting myself into.
>
> I wonder if there's a way of translating the values found in $w into 001
> outside of XSLT -- that's a language not well suited to the task. Could we
> do it in perl, and stash the results in some 9XX field?
>
> I was kind of hoping that we would be able to use whatever we got back
> from extractControlNumber as a base for any transformations. That may or
> may not be a safe assumption.
>
>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20180727/f6698fce/attachment.html>


More information about the Koha-devel mailing list