用户行为日志埋点

添码座原创大约 5 分钟

数据格式

客户端埋点上报的数据格式如下。

{
    "uid": 2088,                     # 用户ID
    "did": "182f46a-3efb97-a276f",   # 设备ID
    "platform": 0,                   # 设备类型，0:Androi, 1:IOS, 2:PC, 3:Applet
    "ver": "1.5.8"，                 # 大版本号
    "code": "20181014",              # 小版本号
    "net": 4,                        # 网络类型，0:未知, 1:WIFI, 2:3G, 3:4G, 4:5G
    "brand": "iPhone",               # 手机品牌
    "model": "iPhone10",             # 机型
    "display": "2436×1125",          # 分辨率
    "os": "ios16.3",                 # 操作系统版本号
    "data": [                        # 用户行为数据，上面的都是公共字段
        # type=1，表示打开客户端，adstatus：开屏广告展示状态，loadingtime：广告加载时间
        {"type":1, "timestamp":1564470454746, "adstatus":1, "loadingtime":100},
        # type=2，表示打开点击浏览商品，id：商品ID，lcation：商品在列表中展示的位置，从0开始
        {"type":2, "timestamp":1564670685472, "id":"123456", "lcation":3},
        # type=3，表示打开点击浏览商品详情，id：商品ID，staytime：页面停留时长
        {"type":3, "timestamp":1564670685472, "id":"123456", "staytime":5860}
        ......
    ]
}

额外说明

事实上，每条日志中的type、timestamp等字段和uid在同一级，就像这样。

{
    "uid": 2088,                     # 用户ID
    "did": "182f46a-3efb97-a276f",   # 设备ID

    ......

    "type":1,
    "timestamp":1564670685472,

    ......
}

上面是为了显示等更清楚才放到data中。

暂定存在5种不同的用户行为。

类型	名称	说明
1	app_open	应用被打开
2	goods_click	点击浏览商品
3	goods_item	商品详情页数据
4	goods_list	商品列表页数据
5	app_collapse	应用崩溃数据

生成的日志数据被日志服务接口保存到了/home/work/logs/ua.log文件中。

数据生成

也可以让ChatGPT写一段Java代码，生成需要的测试数据。

提示词也不难，示例如下。

你是一名Java工程师，请写一段Java代码，这段代码能够循环生成指定格式的JSON数据，JSON数据的格式如下：
1. 每条JSON数据中都包括uid，did，platform，ver，code，net，brand，model，display，os，type，timestamp，adstatus，loadingtime等字段。
2. uid是一个随机的长整数，类型是BIGINT。
3. did是一个随机的32位UUID字符串，生成的UUID字符串张要去掉"-"符号。
4. platform是设备类型，包括Android、IOS、PC和APP这几种。
5. ver是大版本号，它由三组数字组成，通过'.'分割，例如'1.2.3'，每组数字范围是1~9之间。
6. code是小版本号，它由日期字符串组成，例如20240101，它可以是介于2023年和2024年之间的任何一天。
7. net是网络类型，0表示未知，1表示WIFI、2表示3G、3表示4G、4表示5G，它的值在0~4之间随机。
8. brand是手机品牌，包括目前市面上流行的主流手机品牌。
9. model是对应品牌的手机型号，它是随机的。
10. display是手机的分辨率，与手机型号对应。
11. os是手机对应的操作系统版本号，请用对应机型的实际的版本号代替。
12. type包括app_open、goods_click、goods_item、goods_list、app_collapse这几类。
13. timestamp是当前时间戳。
14. adstatus是是否打开了广告，0表示未打开，1表示打开，它的值是随机的，要么是0，要么是1。
15. loadingtime是屏幕加载时间，它是一个介于1000~2000之间的随机值，单位是毫秒。
......

唯一的问题是要手动将ChatGPT提示的org.json改成fastjson，其他的都可以原样照抄。

下面是ChatGPT生成的完整Java代码。

/**
 * 这里所有的代码都由ChatGPT生成，除了本注释和包名package是人为增加的以外，其他一行都没有改动
 *
 */
package com.imooc.useraction;

import com.alibaba.fastjson.JSONObject;
import java.util.Random;
import java.util.UUID;

public class JSONGenerator {
    private static final String[] PLATFORMS = {"Android", "IOS", "PC", "APP"};
    private static final String[] BRANDS = {"Samsung", "Apple", "Huawei", "Xiaomi", "Oppo", "Vivo"};
    private static final String[] MODELS = {"Galaxy S21", "iPhone 13", "P40", "Mi 11", "Reno 5", "X60"};
    private static final String[] DISPLAYS = {"1080x2400", "1170x2532", "1440x3200", "1080x2340", "1080x2408", "1080x2400"};
    private static final String[] OS_VERSIONS = {"11", "15", "10", "11", "12", "14"};

    private static Random random = new Random();
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            JSONObject jsonObject = generateRandomJSON();
            System.out.println(jsonObject.toString());
        }
    }

    private static JSONObject generateRandomJSON() {
        JSONObject json = new JSONObject();

        json.put("uid", random.nextLong());
        json.put("did", UUID.randomUUID().toString().replace("-", ""));
        json.put("platform", PLATFORMS[random.nextInt(PLATFORMS.length)]);
        json.put("ver", generateRandomVersion());
        json.put("code", generateRandomDate());
        json.put("net", random.nextInt(5));
        json.put("brand", BRANDS[random.nextInt(BRANDS.length)]);
        json.put("model", MODELS[random.nextInt(MODELS.length)]);
        json.put("display", DISPLAYS[random.nextInt(DISPLAYS.length)]);
        json.put("os", OS_VERSIONS[random.nextInt(OS_VERSIONS.length)]);
        json.put("type", generateRandomType());
        json.put("timestamp", System.currentTimeMillis());
        json.put("adstatus", random.nextInt(2));
        json.put("loadingtime", 1000 + random.nextInt(1001));
        return json;
    }

    private static String generateRandomVersion() {
        return random.nextInt(9) + 1 + "." +
                random.nextInt(9) + 1 + "." +
                random.nextInt(9) + 1;
    }

    private static String generateRandomDate() {
        int year = random.nextInt(2) + 2023;
        int month = random.nextInt(12) + 1;
        int day = random.nextInt(28) + 1;
        return String.format("%d%02d%02d", year, month, day);
    }

    private static String generateRandomType() {
        String[] types = {"app_open", "goods_click", "goods_item", "goods_list", "app_collapse"};
        return types[random.nextInt(types.length)];
    }
}

得到Java代码后，就可以根据自己的需要对它进一步地修改，直到满意为止。

数据采集

通过Flume来采集客户端上报的日志数据。

# 先删除所有其他的配置文件
> rm -rf /home/work/flume-1.11.0/conf
> mkdir /home/work/flume-1.11.0/conf
> cd /home/work/flume-1.11.0/conf
> vi ua-data-to-hdfs.conf

# agent的名称是a1
# 指定source组件、channel组件和Sink组件的名称
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# 配置source组件
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/work/logs/ua.log

# 配置拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = "act":(\\d)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.name = act

# 配置channel组件
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 配置sink组件
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://server01:9000/data/ods/ua/%Y%m%d/%{act}
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 3600
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#增加文件前缀和后缀
a1.sinks.k1.hdfs.filePrefix = ua
a1.sinks.k1.hdfs.fileSuffix = .log

# 把组件连接起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

在Flume中执行如下命令。

> cd /home/work/flume-1.11.0/bin
> nohup ./flume-ng agent --name a1 --conf conf --conf-file conf/ua-data-to-hdfs.conf &

然后就可以通过如下命令在HDFS中看到导入进来的文件了。

> hdfs dfs -ls /data/ods/ua

感谢支持

更多内容，请移步《超级个体》。