egmkang 服务端开发工程师

C#里面滥用String造成的性能问题

2016-08-03
C#

前两天给我们的json写一个解析函数, 之前用的正宗的json parser, 支持完整的json特性. 但是实际上我们用到特性, 只有key-value的映射, value的类型只有数字字符串两种类型. 由于parse的速度比较慢, 所以我打算自己用字符串解析一遍. 第一个能工作的原型出来的时候, 速度和json解析差不多. 做了profile之后发现, 绝大部分时间都浪费在构造String和检索IndexOf上面.

下了coreclr的源码研究了一下, 发现String.Split在实现的时候, 先扫描一遍split, 计算有多少个元素, 然后分配一个Array, 然后再去做Split操作. Split操作里面还会再new一个新的String出来, 顺便做一下拷贝. 看到这里我就惊呆了, 本来String在C#和Jawa这两个托管语言里面都是不可变的, 那么为什么他们不用一个Slice去构造一个SubString呢?

网上搜了一下, 也没发现有人写的StringSlice或者类似的东西, 我就顺手撸了一个StringView, 一个只读的StringSlice.

using System.Collections.Generic;

public unsafe struct StringView
{
    public static readonly StringView Empty = new StringView("");

    public StringView(string str) : this(str, 0, str.Length) { }

    public StringView(string str, int begin, int length)
    {
        this.str = str;
        this.begin = begin;
        this.length = length;
        if (str.Length <= 0) return;

        if (this.begin < 0 ||
            this.begin >= this.str.Length ||
            this.begin + this.length > this.str.Length)
        {
            throw new System.Exception("StringView's Constructor OutOfBound");
        }
    }

    public int IndexOf(char c, int start = 0)
    {
        fixed (char* p = this.str)
        {
            for (int i = start; i < length; ++i)
            {
                if (p[this.begin + i] == c) return i;
            }
        }

        return -1;
    }

    private static bool ArrayContains(char[] array, char c)
    {
        int length = array.Length;
        fixed (char* p = array)
        {
            for (int i = 0; i < length; ++i)
                if (p[i] == c) return true;
        }

        return false;
    }

    public int IndexOf(char[] array, int start = 0)
    {
        if (array.Length == 1) return this.IndexOf(array[0], start);

        fixed (char* p = this.str)
        {
            for (int i = start; i < length; ++i)
            {
                if (ArrayContains(array, p[this.begin + i])) return i;
            }
        }

        return -1;
    }

    public int IndexOf(string s, int start = 0)
    {
        int s1_length = this.str.Length;
        int s2_length = s.Length;
        fixed (char* p1 = this.str)
        {
            fixed (char* p2 = s)
            {
                int index = this.IndexOf(p2[0], start);
                while (index >= 0)
                {
                    if (s2_length > s1_length - this.begin - index)
                        return -1;
                    bool match = true;
                    for (int i = 0; i < s2_length; ++i)
                    {
                        if (p1[this.begin + index + i] != p2[i]) { match = false; break; }
                    }
                    if (match) return index;

                    index = this.IndexOf(p2[0], index + 1);
                }
                return -1;
            }
        }
    }

    public unsafe char this[int index]
    {
        get
        {
            if (index < 0 || index >= this.length)
            {
                throw new System.Exception("StringView's Index OutOfBound");
            }

            fixed (char* p = this.str)
            {
                return p[this.begin + index];
            }
        }
    }

    public StringView SubString(int begin)
    {
        return this.SubString(begin, this.length - begin);
    }

    public StringView SubString(int begin, int length)
    {
        return new StringView(this.str, this.begin + begin, length);
    }

    public List<StringView> Split(char split, List<StringView> array)
    {
        array.Clear();

        int index = 0;
        int pos1 = 0, pos2 = 0;
        pos2 = this.IndexOf(split);
        while (pos2 > 0 && pos2 < this.length)
        {
            array.Add(new StringView(str, this.begin + pos1, pos2 - pos1));
            pos1 = pos2 + 1;
            pos2 = this.IndexOf(split, pos1);
            ++index;
        }
        if (pos1 != this.length) array.Add(new StringView(str, this.begin + pos1, this.length - pos1));

        return array;
    }

    public override bool Equals(object obj)
    {
        if (obj is StringView)
        {
            StringView v = (StringView)obj;
            return this.Equals(v);
        }
        return false;
    }

    public bool Equals(StringView v)
    {
        if (v.Length != this.Length) return false;
        for (int i = 0; i < this.Length; ++i)
        {
            if (this[i] != v[i]) return false;
        }
        return true;
    }

    internal static int CombineHashCodes(int h1, int h2)
    {
        return (((h1 << 5) + h1) ^ h2);
    }

    public override int GetHashCode()
    {
        int hash_code = 0;
        for (int i = 0; i < this.length; ++i)
        {
            hash_code = CombineHashCodes(hash_code, this[i].GetHashCode());
        }
        return hash_code;
    }

    public int Length { get { return this.length; } }

    public override string ToString()
    {
        return this.str.Substring(begin, length);
    }

    public string GetRawString() { return this.str; }
    public int GetBegin() { return this.begin; }

    private string str;
    private int begin;
    private int length;
}

为了方便替换String, 很多接口都保持了一致. 目前这个版本只是满足我自己的需求, 以后可以考虑继续完善添加String的函数进来.

之前说的IndexOf也比较耗, 因为String索引器会带有边界检测, 而IndexOf一直在用索引器, 所以个人感觉是不太合适的, 所以我的StringView一直在用指针….

PS: 修改之后的纯text parse, 速度比json parse的速度快一倍以上, 性能还不错, 实际上还有提升的空间

PS: 现在比较完整的StringView已经上传至github, https://github.com/egmkang/StringView 添加了ToInt64, StringBuilder.Append支持


上一篇 并发连接MySQL

下一篇 base64

Comments